Loading…

Design a RAG Pipeline (AI Answer Engine) - SystemRound

Design a RAG Pipeline (AI Answer Engine)

hardSystem design45 min5 stagesPro

Asked atOpenAIAnthropicGoogleMicrosoftGleanNotionPerplexity

Design a Retrieval-Augmented Generation system that answers natural-language questions over a large private corpus. The system ingests and indexes documents, retrieves the most relevant passages for each question, and has an LLM generate a grounded, cited answer. It must stay fresh as documents change, keep answers faithful to the corpus, control cost, and isolate thousands of tenants — at 50M documents, 500M vectors, and 5K queries per second.

Best after a few full reps. Expect follow-up questions, edge cases, and deeper trade-off discussion.

What this problem tests

Query EndpointIngestion EndpointAuth & ControlsCore ComponentsIngestion FlowQuery Flow (retrieve → generate)

Round shape

5 stages

Time budget

45 min

Feedback loop

Grade anytime

Guided practice·Primary loop

Guided practice

Workspace-first, hints visible, stage retry available. The cheap, repeatable loop — build the answer shape before you take it under pressure.

Stage-by-stage workspace instead of a blank page.
Grade one stage or the whole answer whenever you want.
Compare your reasoning against reference criteria and model answers.

Solve once, compare against the checklist, then come back to the weak stage instead of starting over.

Mock interview·Pressure test

Mock interview

Strict timer, hints hidden, debrief deferred to the end. Use this once you can already structure a clean answer and want to pressure-test pacing and pushback.

Best once the answer shape is already in your head.
Pressure-test pacing, pushback handling, and communication.
Use diagnosis after the interview for exact misses and next study steps.

Best after one structured rep · timed · focused on pacing and communication.

Requirements

This is the framing pass. A strong answer quickly defines what the system must do, what quality bar it has to hit, and the numbers that will justify the rest of the design.

First 5 min of the round

What must exist

Functional Requirements

6 items

1Ingest documents from connected sources and index them — chunk, embed, and store so they are searchable

2Retrieve the most relevant passages for a question via semantic (vector) search

3Generate a grounded answer from those passages, with citations back to the source

4Keep the index fresh — new, edited, and deleted documents reflect within minutes

5Respect per-tenant and per-document access control during retrieval

6Below the line: conversational memory, multimodal retrieval, model training

What good looks like

Non-Functional Requirements

5 items

1Faithfulness — answers grounded in the corpus; a confident wrong answer is worse than "I don't know"

2Retrieval recall — the quality ceiling; if the right chunk isn't retrieved, the LLM can't use it

3Latency — p95 < 2s end-to-end, first streamed token < 1s, retrieval < 200ms

4Freshness — eventual consistency, new docs retrievable within minutes, deletes prompt

5Isolation — hard per-tenant boundaries; retrieval never crosses tenants or ACLs

Numbers to anchor the design

Scale Estimation

5 items

150M documents × ~10 chunks ≈ 500M chunks/vectors

2Vector storage: 500M × 768 dims × 4 bytes ≈ 1.5 TB raw; HNSW overhead ~2-3× → ~3-4 TB → must shard

3Query load: 5K QPS average, ~15K peak

4Ingestion: 10M docs/day × 10 chunks ÷ 86,400 ≈ 1.2K embeddings/sec → a fleet of embedding workers

5Per query: ~8 chunks × ~500 tokens ≈ 4K context tokens; generation tokens dominate the bill, so caching and model tiering matter

How the round unfolds

Each stage has a distinct job. Treat them like separate deliverables instead of one giant answer, and the round becomes much easier to navigate.

4 design stages · 40 pts after framing

🔌

Stage 2~5 min10 pts

API Design

Define the contract clearly: the endpoints, auth boundary, error semantics, and the one or two decisions that matter most.

What you should produce

Let's define the interface.

Strong answers cover

Query EndpointIngestion EndpointAuth & Controls

🏗️

Stage 3~10 min10 pts

High-Level Architecture

Lay out the main components and trace the write path, read path, and any async path cleanly.

What you should walk through

Walk me through the architecture.

Strong answers cover

Core ComponentsIngestion FlowQuery Flow (retrieve → generate)

💾

Stage 4~10 min10 pts

Data Model & Storage

Pick the store, show the schema or key model, and explain why that storage choice fits the access pattern.

What you should lock in

Let's get concrete about storage.

Strong answers cover

Vector Store & Chunk SchemaANN Index & ChunkingPartitioning & Isolation

📈

Stage 5~15 min10 pts

Scaling & Deep Dive

Name the first bottleneck, failure modes, and the trade-offs that keep the system fast and reliable under pressure.

What you should pressure-test

Now the deep dive. Where are the bottlenecks at this scale, how do you keep latency and cost in check, and — critically — how do you keep the answe...

Strong answers cover

Bottlenecks & Latency/RecallFaithfulness & GuardrailsFreshness & Reliability

What a strong first rep looks like

Scope clearly

Translate the prompt into concrete requirements, scale, and trade-offs before drawing architecture.

Stay stage-specific

Give APIs in the API stage, data models in the storage stage, and failure modes in scaling. Don't blur them together.

Iterate fast

Grade early, compare to the reference reasoning criteria, fix the biggest misses, and re-submit the weak stage instead of starting over.