advanceddeep dive

CDC and eventing

Outbox, change-data-capture, derived views, dual-write avoidance, and replay safety.

~10 min read

Change-data-capture is how you keep read models fresh without dual writes. The DB's log is already your most reliable event stream — use it.

Read this if your last attempt…

You have a DB and a search index / cache / analytics warehouse that need to stay in sync
You wrote "we'll dual-write to the DB and to Kafka" and moved on
You don't know what Debezium is
You confuse CDC with event sourcing

The concept

Change-data-capture (CDC) means tailing the database's transaction log (WAL in Postgres, binlog in MySQL) and emitting a stream of change events to downstream consumers. Every committed change becomes an event; nothing else does. This solves the dual-write problem (DB + Kafka in separate writes can drift).

Three ways to get events out of a DB:

Architecture diagram· DB write → WAL → CDC → Kafka → consumers

The DB's transaction log is the source of truth for "what changed". Everything downstream builds on it.

Getting events out of a DB.

Pattern	Correctness	Cost
Dual write	Broken (race condition)	Low until it fails
Outbox (DIY)	Correct	One extra table + a polling/tailing worker
CDC via Debezium	Correct	CDC infrastructure + Kafka
Event sourcing	Correct, different shape	High — rearchitects the write path entirely

How interviewers grade this

You never describe a "dual write to DB and Kafka". Ever.
You name outbox or CDC as the mechanism.
You distinguish CDC (DB first) from event sourcing (events first).
Your downstream consumers (search, cache, warehouse) build from the same event stream.
Consumers are idempotent; offsets enable replay after bugs.

Variants

Outbox pattern

DB-side: extra `outbox` table written in the same tx. Tailed by a worker.

No CDC product needed. Service owns the publishing. Simple, robust, great for single-service event emission. Loses schema flexibility — app has to write correct event payloads.

Pros

+No new infrastructure
+App controls event schema
+Trivial to reason about

Cons

−Each service implements it separately
−Requires a tailer per service
−Polling has a latency floor (~1 s)

Choose this variant when

Single service emitting events
Existing Kafka or similar
Modest number of event types

Debezium (log-tailing CDC)

Reads DB WAL; emits every committed change as a Kafka event. Schema-mirroring.

The industrial-strength answer. Every DB change, exactly once, in order per key. Downsides: infrastructure to run, event schema mirrors DB schema (sometimes too-low-level).

Pros

+Faithful to every DB change
+No app-side outbox work
+Works across many DBs (Postgres, MySQL, Mongo)

Cons

−New infrastructure (Kafka Connect, ZK/KRaft)
−Events mirror tables — might want business-level events
−Schema evolution across DB migrations requires care

Choose this variant when

Many services emitting events
Existing Kafka infra
Warehouse / search index rebuild flows

Event sourcing

Events are the source of truth; DB state is a projection.

Heaviest commitment. Every mutation is an event persisted first; state is rebuilt from the event log. Powerful for audit and replay; expensive in developer cognitive load.

Pros

+Full history is native
+Replay / time-travel debugging
+Natural audit log

Cons

−Schema evolution is a whole discipline
−Every developer on the team pays the cognitive tax
−Harder to reason about "current state"

Choose this variant when

Core domain with regulatory history requirements
When the audit trail is the product (finance, health)

Worked example

Scenario: an e-commerce platform. orders table is primary. Search index, analytics warehouse, and cache all need to reflect changes.

Pipeline:

1App writes to Postgres orders table (one write, one transaction).
2Debezium tails the Postgres WAL, publishes orders.change events to Kafka (partitioned by order_id).
3Consumer groups:

- Search-index consumer: upserts into Elasticsearch. - Warehouse consumer: appends to a ClickHouse fact table. - Cache consumer: invalidates or refreshes Redis entries by order_id.

Handling a bug in the search-index consumer:

Fix the consumer code.
Reset the consumer-group offset to the last-known-good (e.g. yesterday).
Replay. Search index rebuilds. No DB touched; no other consumer affected.

Cold-starting a new downstream (new warehouse table):

Snapshot + incremental: Debezium takes a snapshot of the table, emits it as events, then starts tailing the WAL from the snapshot LSN. New consumer replays from the snapshot offset.

What we do NOT do:

App does not publish to Kafka directly. App writes one transaction to one DB. That's it.

Good vs bad answer

Interviewer probe

“Your DB is the source of truth but you need the search index and cache to stay fresh. How?”

Weak answer

"After each DB write, the app also publishes to Kafka."

Strong answer

"Never dual-write — that's a drift bug waiting to happen. Instead, the app writes one transaction to Postgres. Debezium tails the WAL and publishes each committed change to Kafka. The search indexer and cache invalidator are two separate consumer groups on that topic. Consumers are idempotent. If a consumer bug corrupts the index, we fix the code, reset its offset, and replay — the DB is untouched. If we don't want the infra cost of Debezium, the outbox pattern — outbox table written in the same tx, tailed by a worker — gets us 80% of the value."

Why it wins: Rejects the dual-write anti-pattern, names the two viable options (CDC / outbox), and highlights replay as the operational win.

Interview playbook2–3 min when keeping multiple stores in sync comes up

When it comes up

A search index, cache, or warehouse must stay in sync with the DB
The interviewer asks how derived read models stay fresh
You are integrating microservices via events
Anyone proposes "write to the DB and publish to Kafka"

Order of reveal

1
1. Reject dual-write. I never write to the DB and the broker in two separate operations — a crash between them drifts the two stores. That is the bug to design out from the start.
2
2. Outbox or CDC. Either an outbox table written in the same transaction and tailed by a worker, or log-tailing CDC (Debezium) reading the WAL. Both publish iff the commit happened.
3
3. DB stays the truth. Search, cache, and warehouse are derived views built from the event stream — never the source of truth.
4
4. Idempotent + replayable. Consumers are idempotent, and I can rebuild a broken downstream by resetting the consumer offset and replaying.
5
5. Project to business events. If raw row-level CDC leaks schema, I project to business events (OrderPlaced) in a stream processor so consumers do not couple to columns.

Signature phrases

“Never dual-write to the DB and the broker — that is a drift bug waiting to happen.”

“The DB's transaction log is already my most reliable event stream.”

“Downstreams are derived views I can rebuild by resetting the offset.”

“Outbox if I want zero new infra; Debezium if many services emit events.”

“Never dual-write to the DB and the broker — that is a drift bug waiting to happen.” — Names the anti-pattern interviewers want you to avoid.
“The DB's transaction log is already my most reliable event stream.” — Captures the core insight of CDC in one line.
“Downstreams are derived views I can rebuild by resetting the offset.” — Highlights replay as the operational superpower.
“Outbox if I want zero new infra; Debezium if many services emit events.” — Shows you match the mechanism to the team’s scale.

Likely follow-ups

?“Outbox or Debezium — how do you choose?”Reveal

Outbox when a single service emits events and you do not want new infrastructure: an extra table written in the same transaction, tailed by a worker. Debezium (CDC) when many services emit events and you already run Kafka: it tails the WAL with no app-side changes and captures every commit faithfully. Outbox gives you ~80% of the value with a fraction of the operational footprint; CDC wins at fleet scale.

?“How is CDC different from event sourcing?”Reveal

CDC is DB-first, events-derived: rows are the source of truth and the change log is generated from commits. Event sourcing is events-first, DB-derived: the event log is the source of truth and current state is a projection you rebuild. Most systems should do CDC — event sourcing is a much bigger commitment that taxes every developer with replay-and-projection thinking, justified mainly when the audit trail is the product.

?“Your raw CDC events couple every consumer to table columns. How do you decouple?”Reveal

Insert a translation layer: a stream processor (Kafka Streams / Flink) consumes the raw row-change events and emits business-level events — OrderPlaced, PriceChanged — with a stable schema. Consumers subscribe to those, not to the table shape. Now a DB migration that renames a column does not ripple into every downstream, because the projection absorbs it.

Common mistakes

Dual write to DB + Kafka

Classic bug: two writes, two failure modes, eventual drift. Use outbox or CDC.

CDC events that mirror raw tablesAdvanced

Downstream couples to every column change. Project to business-level events (OrderPlaced, PriceChanged) at the CDC layer or in a stream processor (Kafka Streams) to decouple schema.

No replayability

If you can't replay the event stream, you can't rebuild a broken downstream. Keep Kafka retention long enough (days to weeks) to cover your worst-case recovery.

Event sourcing when CDC is enoughAdvanced

Event sourcing is a commitment. Most teams want CDC (DB-first, events-derived) — not event sourcing (events-first, DB-derived).

Practice drills

Explain outbox in 30 seconds.Reveal

In the same DB transaction as your business write, insert a row into an outbox table describing the event. A separate worker polls (or tails) outbox rows with status=pending, publishes them to Kafka, marks them sent. Atomicity: the business row and the outbox row commit together, so the event is published iff the business change committed.

Interviewer: "your search index is out of sync after a deploy. What now?"Reveal

Confirm with a spot-check. If the cause is a consumer bug deployed recently: roll back consumer, reset its Kafka offset to before the bug, replay. If the cause is a missed event (dual-write drift, for example): rebuild from the DB — take a snapshot, emit it as events, replay. Long-term: move to CDC so snapshots and incrementals are a built-in replay mechanism.

When would you pick event sourcing over CDC?Reveal

When the event log IS the product — regulated audit trails in finance, medical records, banking. When you need time-travel and cross-event analytical queries as a first-class thing. Otherwise, CDC gives you 90% of the operational benefit (replay, materialised views) at a fraction of the developer-cognitive cost.

Cheat sheet

•Never dual-write. Outbox or CDC.
•Outbox = app writes extra row in same tx; worker tails.
•CDC (Debezium) = log tailing; every commit becomes an event.
•Event sourcing ≠ CDC. Much bigger commitment.
•Consumers are idempotent; replay via offset reset is the superpower.
•Retain Kafka long enough to cover rebuild scenarios.

Practice this skill

No problem is tagged directly to CDC and eventing yet. These published problems still exercise the same interview category.

search autocomplete ticket booking news feed

Read this if

Pattern

Correctness

Cost

Dual write

Broken (race condition)

Low until it fails

Outbox (DIY)

Correct

One extra table + a polling/tailing worker

CDC via Debezium

Correct

CDC infrastructure + Kafka

Event sourcing

Correct, different shape

High — rearchitects the write path entirely