Pattern·Specialised shapes

Large file upload & blob handling

Chunked, resumable uploads direct to blob storage via signed URLs. The app server never touches the bytes; processing is always async.

~55 min read·16 sections

Files above a few megabytes should never flow through your app server. The interview answer is always "signed URL, direct to S3/GCS" — the bytes never enter your fleet, and everything after the upload happens asynchronously.

Chunked, resumable uploads direct to blob storage via signed URLs. The app server never touches the bytes; processing is always async.

Architecture diagram· Direct-to-blob with signed URLs

The client uploads bytes straight to S3/GCS via a short-lived signed URL; the app only issues the URL and processes the result asynchronously.

You’re looking at this pattern when

User-uploaded media (video, images, audio)
Document processing and attachments
Backup / archive ingest

Shows up in

YouTube / Vimeo video upload
Profile-photo and avatar upload

Try it on

What most people get wrong

Tiny payloads (< ~1 MB) embedded in a JSON request — just post the bytes

When to reach for this

Reach for this when…

User-uploaded media (video, images, audio)
Document processing and attachments
Backup / archive ingest
Any single file larger than ~10 MB
Uploads that need progress, resume, or scanning before use

Not really this pattern when…

Tiny payloads (< ~1 MB) embedded in a JSON request — just post the bytes
Server-generated content (there is nothing to upload)
A trusted internal batch transfer where a signed URL adds no value

Good vs bad answer

Interviewer probe

“How does video upload and processing work?”

Weak answer

"The client POSTs the video file to /upload and the server saves it to disk, then transcodes it before returning."

Strong answer

"The bytes never touch our app. The client requests a multipart upload from our API; the server returns short-lived signed URLs per 5 MB part, each constrained to the exact key, content-type, and size. The client uploads parts in parallel directly to S3 and retries only failed parts, then calls CompleteMultipartUpload. S3 emits an ObjectCreated event to SQS, and an idempotent worker (keyed by upload_id) AV-scans, sniffs the real MIME, transcodes to HLS variants, generates thumbnails, and flips the DB status from uploading to ready. The user sees an async progress indicator over SSE. Untrusted uploads land in a quarantine bucket; only post-scan files are promoted to the trusted, CDN-served bucket on a separate origin. Delivery is signed CDN URLs with range requests for seeking. Lifecycle rules abort incomplete multipart after 7 days and age old renditions to cold storage. Processing failures go to a DLQ."

Why it wins: Names signed multipart with constrained URLs, async idempotent processing, the two-bucket trust boundary with MIME sniffing, status-driven UX, signed CDN delivery, and lifecycle/cost hygiene. The weak answer streams bytes through the app and blocks the request on transcoding — the two cardinal sins.

Cheat sheet

•Bytes bypass the app — always signed direct-to-blob upload.
•Signed URL constrains method + key + content-type + content-length + short expiry.
•Single PUT under ~100 MB; multipart/resumable above it.
•Multipart = parallel parts + per-part retries; a blip costs one part, not the file.
•Processing is always async: blob event → queue → idempotent worker.
•Status model: uploading → uploaded → processing → ready/failed.
•A completed upload is not a ready asset — the processor owns the transition.
•Two-bucket trust boundary: untrusted → MIME sniff + AV scan → trusted.
•The declared MIME type is a claim — sniff magic bytes server-side.
•Serve from a separate origin via signed CDN URLs; buckets private by default.
•Lifecycle rule: abort incomplete multipart after ~7 days (phantom cost).
•Tier Standard → IA → Glacier and dedup by content hash to control cost.

Core concept

The single organising principle of this pattern is the bytes bypass your application. Your app fleet is sized for small, fast JSON requests; routing multi-gigabyte files through it makes every instance a bandwidth bottleneck, an out-of-memory risk, and a hostage to slow-client timeouts. So the app's job shrinks to issuing permission and reacting to completion, while blob storage does the heavy lifting of ingest.

Architecture diagram· Direct-to-blob with signed URLs

The client uploads bytes straight to S3/GCS via a short-lived signed URL; the app only issues the URL and processes the result asynchronously.

The canonical flow:

1Client requests a signed upload URL. POST /uploads with filename, content-type, and size. The server returns a short-lived (≈15-minute) pre-signed S3/GCS URL plus an upload_id, and records a row in status uploading.
2Client PUTs bytes directly to blob storage. A single PUT for small files, multipart for large. Your fleet never sees a byte of payload.
3Blob storage emits a completion event (S3 → SQS/EventBridge, GCS → Pub/Sub) when the object lands.
4A processor worker consumes the event and runs the pipeline asynchronously: virus scan → transcode → thumbnail → metadata extraction → flip the DB row from uploading to ready.

Multipart / resumable uploads are how large files survive flaky networks:

Architecture diagram· Multipart upload — parallel parts, per-part retries

A large file is split into N parts, each uploaded in parallel with its own signed URL; only failed parts retry, then a single Complete call assembles them.

The client splits the file into N parts (5 MB+ each), uploads them in parallel using per-part signed URLs, retries only the parts that fail, and finishes with a single CompleteMultipartUpload call that assembles them by ETag. A dropped connection costs one part, not the whole file. (tus.io and GCS resumable uploads are the same idea with a different API.)

Security is the part juniors skip — and it's the most important. A signed URL must be narrowed to a single method (PUT), a single object key, an exact content-type, and a content-length cap, with a short expiry. Otherwise one leaked URL with no constraints lets an attacker upload anything, anywhere in the bucket. And the client's declared content-type is a claim, not a fact: you must sniff the real MIME / magic bytes server-side and scan for malware before the file is ever served, which is why production systems use a two-bucket trust boundary.

Architecture diagram· Two-bucket trust boundary

Uploads land in an untrusted quarantine bucket; only after MIME sniffing and an AV scan are they promoted to the trusted, CDN-served bucket.

Uploads land in an untrusted quarantine bucket; only after passing MIME sniffing and an AV/policy scan are they promoted to a trusted bucket served from a separate origin via CDN. This prevents the classic stored-XSS attack where an HTML file uploaded as image/png gets served from your app's origin.

Interview walkthrough

Worked example: design video upload and processing

Architecture diagram· Direct-to-blob with signed URLs

The client uploads bytes straight to S3/GCS via a short-lived signed URL; the app only issues the URL and processes the result asynchronously.

Step 0 — the non-negotiable. The app server never receives the video bytes. Its job is to authorize and sign, then react to completion. Everything else is storage, queue, and workers.

Step 1 — request an upload session. POST /uploads with filename, content-type, and size. The server validates the request, creates a metadata row in status uploading, initiates an S3 multipart upload, and returns the upload_id plus per-part signed URLs — each constrained to method PUT, the exact key, content-type, and a content-length range, expiring in ~15 minutes.

Step 2 — upload parts directly to S3. The client splits the file into parts (e.g. 16 MB each), uploads them in parallel, and retries only parts that fail. A dropped connection costs one part. When all parts succeed, the client calls CompleteMultipartUpload with the ordered (part, ETag) list.

Architecture diagram· Multipart upload — parallel parts, per-part retries

A large file is split into N parts, each uploaded in parallel with its own signed URL; only failed parts retry, then a single Complete call assembles them.

Step 3 — completion event → async pipeline. S3 emits an ObjectCreated event to SQS. An idempotent worker (keyed by upload_id) consumes it. The object is in an untrusted quarantine bucket at this point.

Architecture diagram· Async processing pipeline

Upload completion fans out through a queue to idempotent workers — scan, transcode, thumbnail, extract — each updating status; failures go to a DLQ.

Step 4 — scan, then process. The worker sniffs the real MIME from magic bytes, runs an AV/policy scan, and — on pass — transcodes the source into an HLS bitrate ladder, generates thumbnails, and extracts metadata. On fail, it deletes the object and marks the upload failed.

Step 5 — promote and flip status. Passing, processed assets are promoted to the trusted bucket (served from a separate origin). The worker flips the status uploading → uploaded → processing → ready. The client tracks this over SSE and shows progress; lower resolutions can be published first so the video is watchable while higher renditions finish.

Architecture diagram· Explicit status state machine

A completed upload is not a ready asset — the processor owns the transition. Clients poll or subscribe to this state.

Step 6 — deliver. Playback is served via signed CDN URLs from the private trusted origin, using range requests / adaptive-bitrate segments so players seek and stream without downloading the whole file.

Architecture diagram· Signed delivery via CDN

Downloads are served from a CDN with signed URLs / cookies — private by default, cached at the edge, never through the app.

Step 7 — operate and control cost. Lifecycle rules abort incomplete multipart uploads after 7 days and age old renditions Standard → IA → Glacier. Identical source files are deduplicated by content hash. Processing failures route to a DLQ for inspection.

Result. Bytes never touch the fleet; uploads are resilient and resumable; content is scanned before it's ever served; the user gets a live status; delivery is edge-cached and private; and storage cost is bounded by lifecycle and dedup. Every cardinal sin — bytes through the app, blocking on transcode, trusting the MIME type, public buckets, phantom multipart cost — is designed out.

Interview playbook

Interview playbook5-7 minutes in a 45-minute round: 1 min on bytes-bypass + signed URL, 1-2 min on multipart/resume, 2 min on async processing + status, 1-2 min on the trust boundary and delivery/cost.

When it comes up

Files are large enough to threaten app bandwidth, memory, or request timeouts
Uploads need resume, progress, scanning, processing, or CDN delivery
The prompt mentions videos, images, documents, attachments, or backups
A media platform, file sync, or UGC system is in scope

Order of reveal

1
Move bytes off the app. The app signs a constrained, short-lived URL; the client uploads direct to blob storage — the fleet never touches the bytes.
2
Choose the upload mode. Single PUT under ~100 MB, multipart/resumable above it for parallel parts and per-part retries.
3
Process asynchronously. Completion emits an event to a queue; idempotent workers scan, transcode, and thumbnail while the user sees a processing status.
4
Define the trust boundary. Untrusted quarantine bucket → MIME sniff + AV scan → promote to a trusted bucket on a separate origin.
5
Model status explicitly. uploading → uploaded → processing → ready/failed; a completed upload is not a ready asset.
6
Deliver and control cost. Private buckets, signed CDN delivery with range requests; lifecycle tiering, multipart-abort, and dedup for cost.

Signature phrases

“The app signs intent; storage receives bytes”

“Untrusted until scanned”

“A completed upload is not a ready asset”

“The declared MIME type is a claim, not a fact”

“Abandoned multipart uploads are a bill”

“Downloads bypass the app too”

“The app signs intent; storage receives bytes” — Captures the core control-plane / data-plane split.
“Untrusted until scanned” — Signals security maturity and the two-bucket boundary.
“A completed upload is not a ready asset” — Names the async status contract the processor owns.
“The declared MIME type is a claim, not a fact” — Identifies the stored-XSS / disguised-content risk.
“Abandoned multipart uploads are a bill” — Shows operational cost hygiene most candidates miss.
“Downloads bypass the app too” — Extends the bypass principle to the delivery path via CDN.

Likely follow-ups

?“What if the client loses network halfway through a large upload?”Reveal

Multipart upload makes this cheap: the client tracks which parts succeeded and re-requests signed URLs only for the missing parts, so a 5 GB upload that drops at 90% retries one part, not the whole file. For cross-session resume it persists the upload_id and the completed-part list (or queries ListParts). Abandoned sessions are reclaimed by a lifecycle rule that aborts incomplete multipart uploads after a few days so the orphaned parts stop costing money.

?“What if the processing worker runs twice on the same upload?”Reveal

The completion event is at-least-once and workers retry, so processing must be idempotent. I key the work by upload_id plus the rendition/output key; if the output already exists with a matching checksum, the worker marks the status ready instead of re-transcoding. DB updates are upserts on the same key. Repeated genuine failures go to a DLQ rather than retrying forever.

?“How do you stop someone uploading a malicious file?”Reveal

Defence in depth. The signed upload URL constrains method, exact key, content-type, and content-length, so a leaked URL can only overwrite one key with matching-type, matching-size content briefly. After upload, the file sits in an untrusted quarantine bucket where I sniff the real MIME from magic bytes (ignoring the declared type), run an AV/policy scan, and for images re-encode to strip embedded payloads. Only passing files are promoted to the trusted bucket, which is served from a separate origin so even a slip can't execute in the app's security context.

?“How do you keep storage cost under control at scale?”Reveal

Three levers. Lifecycle rules age objects Standard → Infrequent Access → Glacier/Archive based on access pattern, which can cut storage spend ~10× for archive-heavy data, plus an abort-incomplete-multipart rule to kill phantom parts. Content-addressed deduplication stores identical files once with reference counts and lets the client skip re-uploading parts whose hash already exists. And serving through a CDN offloads egress bandwidth from origin. I'd only age objects to cold tiers whose access truly justifies the retrieval latency and fetch fee.

?“How does the user see progress while processing takes minutes?”Reveal

An explicit status model — uploading → uploaded → processing → ready/failed — exposed by the API separately from the result URLs. The client subscribes over SSE/websocket (or polls) for live updates. For long transcodes I publish lower resolutions first so the asset is usable while higher renditions finish, and the processor owns every transition so the upload endpoint can return immediately with just a handle.

Canonical examples

→YouTube / Vimeo video upload
→Profile-photo and avatar upload
→Dropbox / Google Drive file sync
→GitHub LFS / release-asset upload
→Attachment upload in chat and email apps

Variants

Single signed-PUT upload

One pre-signed PUT straight to blob storage — the simplest correct upload.

Architecture diagram· Direct-to-blob with signed URLs

The client uploads bytes straight to S3/GCS via a short-lived signed URL; the app only issues the URL and processes the result asynchronously.

For files below the multipart cutoff (~100 MB), a single pre-signed PUT is the right tool — simpler than multipart with no assembly step. The app issues a URL constrained to method PUT, the exact object key, content-type, and a content-length range, with a 15-minute expiry; the client PUTs the bytes directly; blob storage emits a completion event.

This covers the vast majority of consumer uploads — profile photos, document attachments, short audio. The only thing it gives up is resumability: a dropped connection means re-uploading the whole file, which is acceptable for small files but not for large ones. Keep the size cap in the signed URL, not just in client validation, so a tampered client can't push a 100 GB object into your bucket.

Pros

+Simplest flow — one URL, one PUT, one event
+No multipart assembly or part-tracking state
+Bytes still bypass the app entirely

Cons

−No resumability — a dropped connection restarts the whole upload
−No parallelism, so large files are slow on lossy networks

Choose this variant when

Files comfortably under ~100 MB
Networks are reliable (server-to-server, good connectivity)
You want the least moving parts

Multipart / resumable upload

Split into parts, upload in parallel, retry only failed parts, assemble with one Complete call.

Architecture diagram· Multipart upload — parallel parts, per-part retries

A large file is split into N parts, each uploaded in parallel with its own signed URL; only failed parts retry, then a single Complete call assembles them.

The standard for large files. The client initiates a multipart upload to get an upload_id, splits the file into parts (S3 minimum 5 MB except the last), requests a signed URL per part, and uploads them in parallel. Each part returns an ETag; when all are done the client calls CompleteMultipartUpload with the list of part numbers and ETags, and storage stitches them into one object.

The payoff is resilience and speed: parallel parts saturate bandwidth, and a network blip retries a single 5 MB part instead of restarting a 5 GB upload. The cost is operational hygiene — abandoned uploads leave billed parts behind, so you must set a lifecycle rule to abort incomplete multipart uploads after a few days. Resumable protocols like tus.io and GCS resumable uploads wrap the same mechanics with progress tracking and a resume token.

Pros

+Resumable — only failed parts retry, not the whole file
+Parallel parts maximise throughput on big files
+Progress reporting falls out naturally per part

Cons

−More client complexity (part tracking, Complete call)
−Abandoned uploads accrue cost without lifecycle cleanup

Choose this variant when

Files above ~100 MB or unreliable mobile networks
Users need upload progress and resume
Video, backups, large datasets

Two-bucket trust boundary

Quarantine bucket → MIME sniff + AV scan → promote to a trusted, CDN-served bucket.

Architecture diagram· Two-bucket trust boundary

Uploads land in an untrusted quarantine bucket; only after MIME sniffing and an AV scan are they promoted to the trusted, CDN-served bucket.

Any system accepting user content needs a trust boundary, because the client's declared content-type and extension are claims an attacker controls. Uploads land in an untrusted bucket that is never served publicly. A completion event triggers a scanner that sniffs the real MIME / magic bytes, runs an AV/policy check, and enforces content rules. Only files that pass are promoted (copied or moved) to a trusted bucket served from a different origin via CDN; failures are deleted and the upload marked failed.

This defeats the stored-XSS attack (HTML disguised as an image served from your origin), keeps malware out of the served path, and gives you a clean place to enforce per-file policy. It composes with both single-PUT and multipart — the scan runs on the completion event regardless of how the bytes arrived.

Pros

+Stops disguised-content and stored-XSS attacks
+Malware never reaches the served path
+Clear policy-enforcement and audit point

Cons

−Extra copy/promote step and a second bucket to manage
−A scanning delay before the asset becomes available

Choose this variant when

Any public or multi-tenant user-generated content
Files are served back to other users
Compliance / safety requires content scanning

tus.io resumable protocol

An open, HTTP-based resumable protocol with a resume token and offset-based continuation.

Architecture diagram· Explicit status state machine

A completed upload is not a ready asset — the processor owns the transition. Clients poll or subscribe to this state.

When you control both client and server and want resumability without coupling to a specific cloud's multipart API, the tus open protocol is a clean choice. The client creates an upload (getting a URL + total length), then PATCHes data at increasing byte offsets; if the connection drops, it HEADs the URL to learn the server's current offset and resumes from there. The resume token is just the upload URL plus the offset, so a paused upload can continue hours later or even on a different network.

tus shines for desktop sync clients and long uploads over poor connectivity. The trade-off versus native S3 multipart is that you operate a tus server (or use a managed one) that ultimately writes to blob storage, adding a hop — but you gain a portable, well-specified resume contract independent of any cloud provider.

Pros

+Cloud-agnostic, well-specified resume contract
+Resume across sessions/networks via offset + token
+Good fit for desktop sync and very long uploads

Cons

−You run a tus server in front of blob storage (extra hop)
−Less "bytes never touch your fleet" than pure signed PUT

Choose this variant when

You control the client and want provider-portable resume
Long uploads over flaky networks (desktop sync)

Scaling path

v1 — signed single-PUT direct to blob

Get bytes off the app server with the least machinery.

The app issues a constrained, short-lived pre-signed PUT URL; the client uploads directly to S3/GCS. Even at v1, the bytes never touch your fleet — that's the non-negotiable foundation, not a later optimisation.

Architecture diagram· Direct-to-blob with signed URLs

The client uploads bytes straight to S3/GCS via a short-lived signed URL; the app only issues the URL and processes the result asynchronously.

Enforce method, key, content-type, content-length, and expiry in the signed URL. This is correct and complete for small files on reliable networks. It falls short the moment files get large or networks get lossy: a dropped connection restarts the whole upload.

What triggers the next iteration

No resumability — large uploads restart on any network blip
No parallelism, so big files are slow
No content validation yet — declared type is trusted

v2 — multipart + async processing

Handle large files reliably and move work off the request path.

Switch large files to multipart so parts upload in parallel and only failures retry. On completion, blob storage emits an event to a queue, and idempotent workers process it — scan, transcode, thumbnail — while the user sees a processing status. Never block an HTTP connection on transcoding.

Architecture diagram· Async processing pipeline

Upload completion fans out through a queue to idempotent workers — scan, transcode, thumbnail, extract — each updating status; failures go to a DLQ.

Add an explicit status model and a DLQ for poison messages. The new responsibilities: idempotent processing (the event can fire more than once) and cleanup of abandoned multipart uploads.

What triggers the next iteration

Processing event is at-least-once — workers must be idempotent
Abandoned multipart parts accrue storage cost
Long processing needs a status contract for the client

v3 — trust boundary + signed CDN delivery

Make uploads safe to serve and deliver them cheaply.

Insert the two-bucket trust boundary: uploads land in quarantine, get MIME-sniffed and AV-scanned, and are promoted to a trusted bucket only on pass. Serve downloads through a CDN with signed URLs / cookies from a private origin — never through the app, and never public-by-default.

Architecture diagram· Signed delivery via CDN

Downloads are served from a CDN with signed URLs / cookies — private by default, cached at the edge, never through the app.

Now content is safe to serve, delivery is edge-cached and cheap, and buckets are private by default. The remaining concern is cost discipline as volume grows.

What triggers the next iteration

Scan adds latency before an asset is available
Hot storage cost grows with retained volume
Promote step doubles writes briefly (copy)

v4 — lifecycle, dedup, and cost control at scale

Keep storage and bandwidth costs sane as the corpus grows huge.

Add lifecycle rules to age objects from Standard → Infrequent Access → Glacier/Archive and to abort incomplete multipart uploads automatically. Deduplicate identical content by hashing (content-addressed storage), so the same file uploaded by many users is stored once. Tune CDN cache TTLs and use range requests for large media streaming.

Architecture diagram· Storage class lifecycle

Lifecycle rules age objects from hot to archive tiers and abort abandoned multipart uploads, cutting cost by an order of magnitude.

At this stage storage strategy is a first-class cost lever — archive-heavy workloads can cut storage spend ~10× with lifecycle transitions, and dedup plus compression cut it further.

What triggers the next iteration

Cold-tier retrieval has latency and per-GB fetch cost
Dedup needs a content hash index and reference counting
Cache invalidation on re-uploaded/replaced assets

Deep dives

Why the bytes must bypass the app server

Architecture diagram· Direct-to-blob with signed URLs

The client uploads bytes straight to S3/GCS via a short-lived signed URL; the app only issues the URL and processes the result asynchronously.

This is the load-bearing decision of the whole pattern, and naming all three reasons signals you've actually run an upload system.

Architecture diagram· Direct-to-blob with signed URLs

The client uploads bytes straight to S3/GCS via a short-lived signed URL; the app only issues the URL and processes the result asynchronously.

Bandwidth. Your app fleet is provisioned for small JSON requests. Stream multi-gigabyte files through it and each instance's network card becomes the bottleneck; you'd scale the fleet for transfer capacity it should never carry. Blob storage is purpose-built for exactly this ingest and is effectively infinitely scalable for it.

Memory. A naive handler buffers the upload in memory and OOMs on large files; even a streaming handler ties up a worker and disk for the entire transfer. Multiply by concurrent uploads and the fleet falls over.

Timeouts and resumability. A large upload over a mobile network can outlast your HTTP request timeout, and a single dropped connection loses the whole transfer. Signed URLs hand the transfer to the storage layer, and multipart gives the client per-part retries and resume — capabilities your app server can't easily offer.

So the app does the cheap part: a millisecond of work to mint a constrained signed URL, and a reaction to the completion event. The storage layer does the expensive part. This separation — the app signs intent; storage receives bytes — is the sentence to lead with.

Multipart mechanics, resumability, and cleanup

Architecture diagram· Multipart upload — parallel parts, per-part retries

A large file is split into N parts, each uploaded in parallel with its own signed URL; only failed parts retry, then a single Complete call assembles them.

Multipart upload is three calls plus N part uploads, and understanding the shape lets you reason about its failure modes.

Architecture diagram· Multipart upload — parallel parts, per-part retries

A large file is split into N parts, each uploaded in parallel with its own signed URL; only failed parts retry, then a single Complete call assembles them.

1Initiate → storage returns an upload_id that ties the parts together.
2Upload parts → the client PUTs each part (5 MB+ for S3, except the final part) to a per-part signed URL, in parallel, and collects the returned ETags.
3Complete → the client sends the ordered list of (part_number, ETag); storage validates and assembles the final object atomically. Until Complete, no object is visible.

Resumability falls out of this: the client tracks which parts succeeded, and on a network failure re-requests signed URLs only for the missing parts. A 5 GB upload that drops at 90% retries one 5 MB part. For cross-session resume you persist the upload_id and the completed-part list (or query storage's ListParts).

The cleanup trap. If the client never calls Complete (closed the tab, crashed), the uploaded parts sit in storage billed indefinitely — invisible because there's no final object, but very much on the invoice. The fix is mandatory: a lifecycle rule to abort incomplete multipart uploads after a few days. "Abandoned multipart uploads are a bill" is a phrase that shows operational maturity. Choose part size to balance parallelism (more parts = more concurrency and finer retry granularity) against overhead (each part is a request); a few tens of MB per part is a common sweet spot for large media.

The trust boundary matters more than the upload API

Architecture diagram· Two-bucket trust boundary

Uploads land in an untrusted quarantine bucket; only after MIME sniffing and an AV scan are they promoted to the trusted, CDN-served bucket.

Most candidates spend their time on the upload mechanics and skip the security model — which is backwards, because the trust boundary is where real systems get breached.

Architecture diagram· Two-bucket trust boundary

Uploads land in an untrusted quarantine bucket; only after MIME sniffing and an AV scan are they promoted to the trusted, CDN-served bucket.

The client's content-type and extension are claims, not facts. An attacker can upload an HTML file labelled image/png. If you store it and later serve it from your app's origin with the attacker-chosen type, you have stored XSS — the file runs in your users' browsers in your origin's security context. The same applies to malware, oversized files, and content-policy violations.

The defence is a two-bucket boundary:

1Uploads land in an untrusted bucket that is never publicly served.
2The completion event triggers a scanner that sniffs the real MIME from magic bytes (ignoring the declared type), runs AV / policy scanning, and enforces size and content rules.
3Only files that pass are promoted to a trusted bucket served from a separate origin (so even a slip can't execute in your app's origin), via CDN with signed delivery.
4Failures are deleted and the upload marked failed.

The signed upload URL does its share too — constraining method, key, content-type, and content-length shrinks the blast radius of a leaked URL to "overwrite one key with matching-type, matching-size content within 15 minutes." Defence in depth: constrain the URL and quarantine-then-scan.

Async processing and the status contract

Architecture diagram· Async processing pipeline

Upload completion fans out through a queue to idempotent workers — scan, transcode, thumbnail, extract — each updating status; failures go to a DLQ.

A completed upload is not a ready asset. Scanning and transcoding take seconds to minutes, and you must never hold an HTTP connection open for them. So processing is async, and the user experience is the status model.

Architecture diagram· Explicit status state machine

A completed upload is not a ready asset — the processor owns the transition. Clients poll or subscribe to this state.

The upload-complete event flows through a queue to idempotent workers — scan, transcode to HLS/DASH variants, generate thumbnails, extract metadata — each advancing an explicit status: uploading → uploaded → processing → ready (or failed/canceled). The API exposes this status (and the result URLs) separately, and the client either polls it or subscribes via SSE/websocket for a live progress indicator.

Idempotency is mandatory because the completion event is delivered at-least-once and workers can be retried: key the work by upload_id + rendition, and if the output already exists with a matching checksum, mark ready instead of re-transcoding. Repeated failures route to a DLQ for inspection rather than retrying forever. The mental shift: the upload endpoint returns fast with a handle; the processor owns the processing → ready transition; the client tracks state. "A completed upload is not a ready asset; the processor owns that transition" is the framing.

Delivery: private by default, signed, CDN-cached

Architecture diagram· Signed delivery via CDN

Downloads are served from a CDN with signed URLs / cookies — private by default, cached at the edge, never through the app.

Upload is only half the system — serving the bytes back has its own pitfalls, and the defaults are dangerous.

Architecture diagram· Signed delivery via CDN

Downloads are served from a CDN with signed URLs / cookies — private by default, cached at the edge, never through the app.

Never public-by-default. A bucket with a public-read ACL means anyone who guesses or obtains a URL can read every object. Buckets should be private; access is granted per request via a signed download URL (or signed cookies for a session) that the app mints after an authorization check. This keeps the access decision in your code, not in a bucket ACL.

Serve through a CDN, not the app. Just as uploads bypass the app, downloads should too. The app issues a short-lived signed URL pointing at the CDN; the CDN serves from cache, fetching from the private origin bucket only on a miss. This gives edge latency, offloads bandwidth, and — for large media — supports range requests so players can seek and stream without downloading the whole file.

Large-media specifics. Video is delivered as adaptive-bitrate segments (HLS/DASH) produced during transcoding, so the player fetches small cached segments at a bitrate matched to the viewer's bandwidth. The signed URL protects the manifest; the segments inherit the protection. The throughline with the rest of the pattern: the app stays on the control path (authorize, sign, react), and the storage + CDN layer owns the data path in both directions.

Storage classes, lifecycle, and dedup as cost levers

Architecture diagram· Storage class lifecycle

Lifecycle rules age objects from hot to archive tiers and abort abandoned multipart uploads, cutting cost by an order of magnitude.

At scale, blob storage cost is dominated by what you keep and how — and lifecycle policy is the lever most candidates never mention.

Architecture diagram· Storage class lifecycle

Lifecycle rules age objects from hot to archive tiers and abort abandoned multipart uploads, cutting cost by an order of magnitude.

Storage classes trade retrieval latency/cost for storage price. A typical policy: Standard for the first 30 days (frequent access), Infrequent Access / Nearline for 30–90 days, Glacier / Archive beyond 90 days for objects rarely read. Lifecycle rules automate the transitions, and for archive-heavy workloads (backups, old media) this can cut storage spend by roughly an order of magnitude. The catch: cold tiers have retrieval latency (minutes to hours) and a per-GB fetch fee, so only age objects whose access pattern truly justifies it.

Cleanup as cost control. The abort-incomplete-multipart rule (covered earlier) prevents phantom part storage. Expiring temporary derivatives and old versions keeps the bucket from growing without bound.

Deduplication. Hashing object content (content-addressed storage) means the same file uploaded by many users — a viral video, a common attachment — is stored once with reference counts, not N times. Combined with compression for compressible types, dedup can dramatically cut both storage and the bandwidth of re-uploads (the client can skip uploading a part whose hash already exists). The summary: treat storage tier, lifecycle, and dedup as explicit design choices, not defaults — they're often the largest line item.

Decision levers

Single PUT vs multipart

Cutoff around 100 MB. Below: single signed PUT is simpler. Above (or on lossy mobile networks): multipart wins — parallel parts, part-level retries, resume after a blip. Always cap content-length in the signed URL regardless.

Processing: sync vs async

Always async. Upload completion emits an event; idempotent workers process it (scan, transcode, thumbnail) while the user sees a processing status. Never block an HTTP connection on transcoding; route poison messages to a DLQ.

Trust boundary placement

Validate content-type and size in the signed URL (server-enforced). After upload, sniff real MIME / magic bytes and AV-scan in an untrusted bucket; promote only passing files to a trusted bucket served from a separate origin.

Delivery path

Private buckets by default; serve via signed download URLs / cookies through a CDN from a private origin. Use range requests and adaptive-bitrate segments for large media. Downloads bypass the app just like uploads.

Storage class strategy

Standard for hot (0–30d), Infrequent Access/Nearline (30–90d), Glacier/Archive (90d+). Lifecycle rules automate transitions for ~10× archive savings, plus abort-incomplete-multipart cleanup. Cold tiers add retrieval latency and fetch cost.

Deduplication

Content-address by hash so identical files are stored once with reference counts; skip uploading parts whose hash already exists (delta sync). Combine with compression. The largest cost saving for high-overlap corpora.

Failure modes

Bytes routed through the app server

The fleet becomes the bandwidth bottleneck, instances OOM on big files, and slow uploads eat the request timeout. Always issue a signed URL and upload direct to blob storage.

No upload size cap

A tampered client uploads a 100 GB object to your bucket. Enforce content-length in the signed URL, set bucket quotas, and alert on unusual growth.

Public-read by default

A default-public bucket ACL means anyone with a URL reads any object. Default to private; serve via signed download URLs or a CDN with signed cookies.

Trusting the declared MIME type

A client labels an HTML file image/png; served from your origin it becomes stored XSS. Sniff real MIME / magic bytes server-side and serve from a separate origin.

Synchronous processing on the request

Blocking the HTTP connection on transcoding ties up workers and times out. Process async via blob event → queue → idempotent worker, with a status model.

Forgotten multipart cleanupAdvanced

Abandoned uploads leave parts billed forever, invisible because there is no final object. Add a lifecycle rule to abort incomplete multipart uploads after a few days.

Non-idempotent processingAdvanced

At-least-once completion events plus worker retries cause duplicate transcodes or double DB writes. Key work by upload_id + rendition and short-circuit if the output already exists.

Case studies

Dropbox

Dropbox — content-addressed blocks, dedup, and resumable sync

Dropbox's storage system splits every file into fixed-size blocks (historically 4 MB) and addresses each block by the hash of its contents. This content-addressing yields two big wins that this pattern highlights as cost levers. First, deduplication: if a block's hash already exists in storage, it isn't uploaded or stored again — so a file shared across many users, or an unchanged file re-synced, costs almost nothing. Second, delta sync: when a file changes, only the blocks whose hashes changed are uploaded, not the whole file.

The client maintains a manifest of block hashes; on sync it asks the server which blocks are missing and uploads only those, then commits the file as an ordered list of block hashes. This is multipart upload taken to its logical conclusion — parts are content-addressed and globally deduplicated — and it makes uploads inherently resumable: an interrupted sync resumes by re-checking which blocks still need sending.

Dropbox also famously separated the metadata plane (which blocks make up which file, permissions, sync state) from the block storage plane (the bytes), so the control path and data path scale independently — exactly the "app signs intent; storage holds bytes" separation. The takeaway: content-addressed blocks turn dedup, delta sync, and resumability into one mechanism.

Takeaway: Splitting files into content-addressed blocks makes deduplication, delta sync, and resumable upload the same mechanism — and separating the metadata plane from the block plane lets control and data paths scale independently.

YouTube

YouTube — resumable upload, async transcode ladder, and status

YouTube's upload uses Google's resumable upload protocol: the client starts a session and gets a session URI, then uploads bytes in chunks; if the connection drops, it queries the session for the current offset and resumes — no restart. This is the same offset-based resume contract as tus, applied at planet scale, and it's why a creator on a flaky connection can upload a multi-gigabyte 4K video reliably.

Once bytes land, the work is overwhelmingly asynchronous. A single source file is transcoded into a ladder of resolutions and codecs (from low-bitrate mobile up to 4K/8K, across multiple codecs) so playback can adapt to each viewer's device and bandwidth via adaptive-bitrate streaming. Transcoding a long, high-resolution video takes minutes, so the upload API returns immediately and the creator watches an explicit status progress through processing — exactly the "a completed upload is not a ready asset" contract. Lower resolutions are typically published first so the video is watchable while higher-quality renditions finish.

Delivery is then segment-based adaptive streaming from a global CDN, with the heavy transcoded variants cached at the edge. The architecture is the textbook large-blob pipeline: resumable direct ingest → async transcode fan-out → status-driven UX → CDN segment delivery — just at extraordinary scale.

Takeaway: Resumable chunked ingest plus a fully async transcode ladder (publish low-res first, finish high-res later) with an explicit processing status is the blueprint for large media — the upload returns fast and the processor owns the path to "ready".

Imgur / UGC platforms

User-generated image hosts — the quarantine-and-scan trust boundary

Image- and file-hosting platforms that accept anonymous or low-friction uploads live or die by their trust boundary, because they serve user content back to other users at massive scale — the ideal conditions for content-based attacks. The hard-won lesson across this category is that the client's declared content-type and extension cannot be trusted, and content must be validated before it is ever served.

The standard architecture matches this pattern's two-bucket model: uploads land in a quarantine bucket that is never on the served path; a scanning pipeline sniffs the true MIME from magic bytes (rejecting an HTML file masquerading as a PNG), runs malware and content-policy checks, often re-encodes/normalises the image (stripping metadata and any embedded payloads by decoding and re-encoding to a known-good format), and only then promotes the sanitised asset to a trusted bucket served from a separate domain via CDN. Serving user content from a different origin than the application is itself a defence — even if a malicious file slips through, it can't execute in the app's security context, neutralising stored-XSS.

This category also leans hard on CDN delivery with signed/cached access and aggressive deduplication (the same meme uploaded a million times is stored once), tying together the trust-boundary and cost themes of the pattern.

Takeaway: For user-generated content, treat every upload as hostile: quarantine it, sniff the real MIME and scan, re-encode to strip payloads, then serve the sanitised asset from a separate origin via CDN — content validation before serving is non-negotiable.

Decision table

Blob handling separates upload, processing, trust, and delivery — design each.

Decision	Default	When to change it	Robust answer includes
Upload path	Signed direct-to-blob URL	Tiny payloads can go through the app	Method/key/type/size/expiry constraints
Large upload	Multipart / resumable	Single PUT below ~100 MB	Part retries and abandoned-upload cleanup
Processing	Async event → queue → worker	Inline only for trivial validation	Status model, idempotency, retries, DLQ
Trust boundary	Untrusted → scan → trusted bucket	Private internal files may skip it	MIME sniffing, AV scan, separate origin
Delivery	Private bucket + signed CDN URL	Truly public assets can be cached open	Signed URLs/cookies, range requests
Cost	Lifecycle tiering + dedup	Small/short-lived corpora may not need it	Class transitions, multipart abort, hashing

Enforce content-length and content-type in the signed URL, not just in the client.
A completed upload is not a ready asset — the processor owns the transition to "ready".

Drills

Why signed URLs instead of streaming through the app?Reveal

Three reasons. Bandwidth — the app fleet would become the transfer bottleneck and cost centre for traffic it should never carry. Memory — large uploads buffer to RAM and OOM, or tie up a worker and disk for the whole transfer. Timeout/resumability — a big upload over a mobile network can outlast the HTTP timeout, and a dropped connection loses everything; signed URLs hand the transfer to storage and multipart gives per-part retries. The app spends a millisecond minting a constrained URL; storage handles the ingest.

An attacker has a leaked signed upload URL. What's the damage?Reveal

It depends entirely on the constraints. If the URL is scoped to PUT + the exact key + content-type + content-length + a 15-minute expiry, the worst they can do is overwrite that one key with content of matching type and size within 15 minutes — blast radius of one object. Without those constraints (just "upload to the bucket"), they can upload anything, anywhere in the bucket, for the URL's lifetime — polluting it with arbitrary, possibly malicious content. Constrain the URL to shrink the blast radius.

Why is processing always asynchronous?Reveal

Scanning and transcoding take seconds to minutes; holding an HTTP connection open for them ties up a worker, risks a timeout, and couples upload latency to processing. Instead the upload completes fast, blob storage emits an event to a queue, and idempotent workers process it while the user watches a status indicator. The upload endpoint returns a handle; the processor owns the transition to ready.

Why a two-bucket trust boundary, and what does the scanner check?Reveal

Because the client's content-type and extension are attacker-controlled claims — an HTML file labelled image/png, served from your origin, is stored XSS. Uploads land in an untrusted bucket that is never served; a scanner sniffs the real MIME from magic bytes (ignoring the declared type), runs AV/policy checks and size limits, often re-encodes images to strip embedded payloads, and only promotes passing files to a trusted bucket served from a separate origin. Even a slip-through can't execute in the app's security context.

You forgot one lifecycle rule and your storage bill is mysteriously high. Which rule?Reveal

The abort-incomplete-multipart rule. When a client initiates a multipart upload and never calls Complete (closed the tab, crashed), the uploaded parts persist in storage billed indefinitely — and they're invisible because there's no final object to see in a normal listing. A lifecycle rule that aborts incomplete multipart uploads after a few days reclaims them. "Abandoned multipart uploads are a bill."

How do you make the processing worker safe to retry?Reveal

Idempotency keyed by upload_id + rendition. The completion event is at-least-once and workers retry on failure, so the worker first checks whether the output already exists with a matching checksum; if so it marks ready and returns instead of re-transcoding. DB writes are upserts on the same key, not blind inserts. Genuine repeated failures route to a DLQ for inspection rather than looping forever. This makes duplicate deliveries and retries harmless.

When to reach for this

Decision

Default

When to change it

Robust answer includes

Upload path

Signed direct-to-blob URL

Tiny payloads can go through the app

Method/key/type/size/expiry constraints

Large upload

Multipart / resumable

Single PUT below ~100 MB

Part retries and abandoned-upload cleanup

Processing

Async event → queue → worker

Inline only for trivial validation

Status model, idempotency, retries, DLQ

Trust boundary

Untrusted → scan → trusted bucket

Private internal files may skip it

MIME sniffing, AV scan, separate origin

Delivery

Private bucket + signed CDN URL

Truly public assets can be cached open

Signed URLs/cookies, range requests

Cost

Lifecycle tiering + dedup

Small/short-lived corpora may not need it

Class transitions, multipart abort, hashing