intermediatedeep dive

Large blob handling

Multipart upload, object storage, metadata stores, signed URLs, and lifecycle policies.

~10 min read

Files above a few MB should never touch your app server. Every byte that flows through your app is bandwidth you pay for, memory you stress, and latency you inflict.

Read this if your last attempt…

You wrote "upload to our server; our server writes to S3"
You don't know what a signed URL is
You can't explain chunked / resumable uploads
You haven't thought about content-type spoofing or file-size limits

The concept

The anti-pattern: client POSTs a 500 MB video to your app server; app server streams it to S3. You just paid bandwidth + memory + 10s of latency — and if your app process dies mid-upload, the user starts over.

The right pattern: direct-to-blob with signed URLs.
1. Client asks app for an upload URL (POST /uploads with metadata: size, content-type, filename).
2. App validates (size limits, content-type allowlist, auth), creates a DB row with status=pending, generates a pre-signed S3 PUT URL scoped to that specific object key, and returns it.
3. Client PUTs the file directly to S3 — app server is out of the data path.
4. S3 triggers a Lambda / webhook on upload completion → app marks DB row status=ready + runs virus scan / transcoding / indexing.

Architecture diagram· Direct-to-blob with signed URL

App issues the URL; client uploads directly to S3; app never sees the bytes. S3 event triggers post-processing.

Upload strategies.

Strategy	Good for	Bad for
POST through app server	Tiny files (<1 MB), tight auth	Anything bigger — bandwidth + memory tax
Signed URL PUT	Most files up to ~100 MB	No restart on failure
Multipart / resumable	Large files, flaky networks, progress UI	Small files (overhead not worth it)
Browser direct with tus.io	Consumer apps with progress UX	Requires tus server

How interviewers grade this

Client uploads directly to blob storage via signed URL; app is out of the data path.
App validates size + content-type BEFORE issuing the URL.
Signed URL constrains content-type and max size.
Post-processing (scan, transcode, index) is event-driven, not synchronous.
Large files use multipart / resumable upload.

Variants

Signed URL PUT (simple)

One signed URL; one PUT; done.

The default for files up to ~100 MB. Works with any S3-compatible blob store. Client gets a URL, PUTs, we get an event.

Pros

+Dead simple
+Zero bytes through your app
+Works everywhere

Cons

−No resume on failure
−Bad for multi-GB files

Choose this variant when

Files up to ~100 MB
Stable networks

S3 multipart / tus resumable

Split into chunks; upload each; atomic complete.

For anything big. 5 MB minimum chunk for S3 multipart (except the last). Failed chunks retry independently. Client state tracks uploaded parts for resume.

Pros

+Resumable on failure
+Parallel chunk uploads for speed
+Progress UI natural

Cons

−More complex client logic
−Abandoned uploads need cleanup policy (S3 lifecycle rule)

Choose this variant when

Files > 100 MB
Flaky networks (mobile, international)
User expects a progress bar

Pre-signed POST (HTML form)

Browser posts a form directly to S3 with signed policy.

Legacy pattern; pre-signed PUT is generally cleaner. Useful when you need a plain HTML form (no JS).

Pros

+Works without JS
+Well-documented

Cons

−More complex signing (policy doc)
−Less flexible than PUT URLs

Choose this variant when

Progressive enhancement
JS-free form uploads

Worked example

Scenario: video upload for a course platform. Videos 50 MB – 5 GB.

Flow:

1Client POST /uploads with {filename, size, mime}. App checks size ≤ 10 GB, mime in [video/mp4, video/webm, …], auth OK. Inserts DB row uploads (id, user_id, status=pending, size, mime).
2App creates a multipart upload in S3 (InitiateMultipartUpload), returns upload_id + chunk-signing endpoint.
3Client splits file into 10 MB chunks. For each chunk, requests a signed UploadPart URL from app; PUTs the chunk. App authorises per-chunk.
4Client POSTs /uploads/:id/complete with part ETags. App calls S3 CompleteMultipartUpload.
5S3 event → Lambda → enqueue transcoding job (MediaConvert or FFmpeg on containers). Transcode into HLS/DASH renditions (240p, 480p, 720p, 1080p). Write manifest + chunks to CDN-fronted bucket.
6App updates row status=ready + CDN URL.

Cleanup: S3 lifecycle rule aborts multipart uploads abandoned > 24h. uploads rows in pending > 7d are deleted by a sweeper job.

Security:

Signed URLs scoped to one specific object key, one user, 1-hour expiry.
Content-type enforced at the signed-URL level (PUT with wrong content-type rejected by S3).
Virus scan on the raw bucket before moving to the clean bucket.
Transcoding bucket is behind CloudFront with signed URLs for paid content.

Good vs bad answer

Interviewer probe

“How do users upload 500 MB videos to your app?”

Weak answer

"They POST to our /upload endpoint, and we stream it to S3."

Strong answer

"Direct-to-S3 via signed URLs — the app server never sees the bytes. Client POSTs metadata; we validate size + mime + auth; issue an S3 multipart InitiateUpload + per-chunk signed URLs. Client PUTs chunks direct to S3, calls CompleteMultipartUpload. S3 event triggers a Lambda that marks the DB row ready and enqueues transcoding. For files this big, multipart is mandatory — single PUT of 500 MB fails often on mobile. The app is out of the data path; our bandwidth bill is metadata + events only. Security: per-chunk signing, content-type enforced on the URL, short expiry, virus scan before publishing."

Why it wins: Direct-to-blob, multipart, security boundaries, event-driven post-processing, honest about bandwidth cost.

Interview playbook2–3 min on any upload- or media-heavy product

When it comes up

Users upload images, video, documents, or any large file
The interviewer says "store and serve user media"
A flow involves files bigger than a few MB
Profile pictures, attachments, video uploads, backups

Order of reveal

1
1. Keep the app out of the data path. The app never streams the bytes. The client uploads directly to blob storage via a pre-signed URL.
2
2. Validate before issuing. Before I hand out the URL, the app checks size limits, a content-type allowlist, and auth — and bakes those constraints into the signed URL so S3 enforces them.
3
3. Multipart for large files. Anything over ~100 MB uses multipart / resumable upload, so a network blip retries one chunk instead of the whole file.
4
4. Event-driven post-processing. The upload-complete event triggers virus scan, transcoding, thumbnailing, and indexing — none of it blocks the request.
5
5. Downloads. Serve via signed GET URLs fronted by a CDN; I only proxy bytes through the app when DRM or per-request watermarking demands it.

Signature phrases

“The bytes never flow through my app — signed URL, direct to S3.”

“I validate size and content-type before issuing the URL, and S3 enforces both.”

“Files over ~100 MB are multipart so a blip retries one chunk.”

“Post-processing is event-driven off the upload-complete event.”

“The bytes never flow through my app — signed URL, direct to S3.” — The core pattern that separates a scalable design from a toy one.
“I validate size and content-type before issuing the URL, and S3 enforces both.” — Shows the signed URL is a security boundary, not a convenience.
“Files over ~100 MB are multipart so a blip retries one chunk.” — Demonstrates you handle real-world flaky uploads.
“Post-processing is event-driven off the upload-complete event.” — Keeps the write path fast and decoupled.

Likely follow-ups

?“Why not just let the app receive the upload? It is simpler.”Reveal

It is simpler until you have ten concurrent 500 MB uploads pinning gigabytes of memory and tying up worker threads for minutes each, or one slow mobile client holding a connection open for 20 minutes. App-in-the-path does not survive past toy load: you pay the bandwidth twice (in and out), stress memory, and a process restart loses the whole upload. Direct-to-blob scales trivially and is the industry norm.

?“How do you stop a user uploading a 10 GB malicious payload?”Reveal

Layers: (1) a content-type allowlist checked before the URL is issued; (2) a max-size constraint baked into the signed URL so S3 rejects overruns at write time; (3) a virus scan fired from the upload-complete event before the object is exposed to anyone else; (4) per-user upload rate limits and an account quota so a free tier cannot be flooded with junk. No single check is enough — the URL constraint plus the post-upload scan are the key two.

?“How do you serve private downloads — say paid course videos?”Reveal

Pre-signed GET URLs with a short expiry, fronted by a CDN with signed-URL/signed-cookie support (CloudFront). The client gets a time-boxed URL, the CDN validates the signature and serves from the edge, and the origin bucket stays private. I only stream bytes through the app when I need per-request logic the CDN cannot do — DRM license issuance or per-user watermarking — and even then only the license step, not the media payload.

Common mistakes

Proxying uploads through the app

Your app pays for the bandwidth, memory for streaming, and any failure restarts the whole upload. Use signed URLs.

No content-type / size enforcement on the signed URL

Client can upload an executable claiming it's an image. Bind content-type and max size to the signed URL — S3 enforces at write time.

Abandoned multipart uploads left foreverAdvanced

Each abandoned multipart upload consumes storage and accrues cost. S3 lifecycle rule to abort after N days.

Serving private blobs by proxying

Same pattern, download direction. Use signed GET URLs + CDN; don't stream bytes through your app. Exception: per-request watermarking or DRM that genuinely needs server-side assembly.

Practice drills

Walk me through an upload of a 2 GB video from a mobile client on 4G.Reveal

Client POSTs metadata → app validates → creates S3 multipart upload, returns upload_id. Client splits into (say) 20 chunks of 100 MB. For each chunk, requests signed UploadPart URL, PUTs to S3. If a chunk fails (tunnel, network flip), retries that chunk only. On completion, calls /uploads/:id/complete; app calls S3 CompleteMultipartUpload. S3 fires event; Lambda enqueues HLS transcoding; app marks ready once transcoded. User sees progress through all this, backed by real per-chunk acks.

Interviewer: "can't we just let the app handle uploads? It's simpler."Reveal

"Simpler" right until you have 10 concurrent 500 MB uploads holding 5 GB of memory and blocking your worker pool. Or one slow client keeping a connection open for 20 min. The app-in-path pattern does not scale past toy loads. Direct-to-blob scales trivially and is the industry norm.

How do you stop users from uploading a 10 GB malicious payload?Reveal

Layered: (1) content-type allowlist check before issuing the URL; (2) max-size bound baked into the signed URL — S3 rejects overruns; (3) virus scan on S3-event post-upload before exposing to other users; (4) rate-limit uploads per user (abuse pattern = flood a free tier with trash); (5) quota per account.

Cheat sheet

•App never sees the bytes. Direct-to-blob via signed URL.
•Small (<100 MB): signed PUT. Large: multipart / tus.
•Validate size + mime + auth BEFORE issuing the URL.
•Constrain content-type and max size ON the URL.
•Event-driven post-processing (virus scan, transcode, index).
•S3 lifecycle cleans up abandoned multipart uploads.
•Downloads: signed GET + CDN. Don't proxy.

Practice this skill

These problems exercise Large blob handling. Try one now to apply what you just learned.

youtube

Read this if

Strategy

Good for

Bad for

POST through app server

Tiny files (<1 MB), tight auth

Anything bigger — bandwidth + memory tax

Signed URL PUT

Most files up to ~100 MB

No restart on failure

Multipart / resumable

Large files, flaky networks, progress UI

Small files (overhead not worth it)

Browser direct with tus.io

Consumer apps with progress UX

Requires tus server