Large file upload & blob handling
Chunked, resumable uploads direct to blob storage via signed URLs. The app server never touches the bytes; processing is always async.
When to reach for this
Reach for this when…
- User-uploaded media (video, images, audio)
- Document processing and attachments
- Backup / archive ingest
- Any single file larger than ~10 MB
- Uploads that need progress, resume, or scanning before use
Not really this pattern when…
- Tiny payloads (< ~1 MB) embedded in a JSON request — just post the bytes
- Server-generated content (there is nothing to upload)
- A trusted internal batch transfer where a signed URL adds no value
Good vs bad answer
Interviewer probe
“How does video upload and processing work?”
Weak answer
"The client POSTs the video file to /upload and the server saves it to disk, then transcodes it before returning."
Strong answer
"The bytes never touch our app. The client requests a multipart upload from our API; the server returns short-lived signed URLs per 5 MB part, each constrained to the exact key, content-type, and size. The client uploads parts in parallel directly to S3 and retries only failed parts, then calls CompleteMultipartUpload. S3 emits an ObjectCreated event to SQS, and an idempotent worker (keyed by upload_id) AV-scans, sniffs the real MIME, transcodes to HLS variants, generates thumbnails, and flips the DB status from uploading to ready. The user sees an async progress indicator over SSE. Untrusted uploads land in a quarantine bucket; only post-scan files are promoted to the trusted, CDN-served bucket on a separate origin. Delivery is signed CDN URLs with range requests for seeking. Lifecycle rules abort incomplete multipart after 7 days and age old renditions to cold storage. Processing failures go to a DLQ."
Why it wins: Names signed multipart with constrained URLs, async idempotent processing, the two-bucket trust boundary with MIME sniffing, status-driven UX, signed CDN delivery, and lifecycle/cost hygiene. The weak answer streams bytes through the app and blocks the request on transcoding — the two cardinal sins.
Cheat sheet
- •Bytes bypass the app — always signed direct-to-blob upload.
- •Signed URL constrains method + key + content-type + content-length + short expiry.
- •Single PUT under ~100 MB; multipart/resumable above it.
- •Multipart = parallel parts + per-part retries; a blip costs one part, not the file.
- •Processing is always async: blob event → queue → idempotent worker.
- •Status model: uploading → uploaded → processing → ready/failed.
- •A completed upload is not a ready asset — the processor owns the transition.
- •Two-bucket trust boundary: untrusted → MIME sniff + AV scan → trusted.
- •The declared MIME type is a claim — sniff magic bytes server-side.
- •Serve from a separate origin via signed CDN URLs; buckets private by default.
- •Lifecycle rule: abort incomplete multipart after ~7 days (phantom cost).
- •Tier Standard → IA → Glacier and dedup by content hash to control cost.
Core concept
The single organising principle of this pattern is the bytes bypass your application. Your app fleet is sized for small, fast JSON requests; routing multi-gigabyte files through it makes every instance a bandwidth bottleneck, an out-of-memory risk, and a hostage to slow-client timeouts. So the app's job shrinks to issuing permission and reacting to completion, while blob storage does the heavy lifting of ingest.
The client uploads bytes straight to S3/GCS via a short-lived signed URL; the app only issues the URL and processes the result asynchronously.
The canonical flow:
- 1Client requests a signed upload URL.
POST /uploadswith filename, content-type, and size. The server returns a short-lived (≈15-minute) pre-signed S3/GCS URL plus anupload_id, and records a row in statusuploading. - 2Client PUTs bytes directly to blob storage. A single PUT for small files, multipart for large. Your fleet never sees a byte of payload.
- 3Blob storage emits a completion event (S3 → SQS/EventBridge, GCS → Pub/Sub) when the object lands.
- 4A processor worker consumes the event and runs the pipeline asynchronously: virus scan → transcode → thumbnail → metadata extraction → flip the DB row from
uploadingtoready.
Multipart / resumable uploads are how large files survive flaky networks:
A large file is split into N parts, each uploaded in parallel with its own signed URL; only failed parts retry, then a single Complete call assembles them.
The client splits the file into N parts (5 MB+ each), uploads them in parallel using per-part signed URLs, retries only the parts that fail, and finishes with a single CompleteMultipartUpload call that assembles them by ETag. A dropped connection costs one part, not the whole file. (tus.io and GCS resumable uploads are the same idea with a different API.)
Security is the part juniors skip — and it's the most important. A signed URL must be narrowed to a single method (PUT), a single object key, an exact content-type, and a content-length cap, with a short expiry. Otherwise one leaked URL with no constraints lets an attacker upload anything, anywhere in the bucket. And the client's declared content-type is a claim, not a fact: you must sniff the real MIME / magic bytes server-side and scan for malware before the file is ever served, which is why production systems use a two-bucket trust boundary.
Uploads land in an untrusted quarantine bucket; only after MIME sniffing and an AV scan are they promoted to the trusted, CDN-served bucket.
Uploads land in an untrusted quarantine bucket; only after passing MIME sniffing and an AV/policy scan are they promoted to a trusted bucket served from a separate origin via CDN. This prevents the classic stored-XSS attack where an HTML file uploaded as image/png gets served from your app's origin.
Interview walkthrough
Worked example: design video upload and processing
The client uploads bytes straight to S3/GCS via a short-lived signed URL; the app only issues the URL and processes the result asynchronously.
Step 0 — the non-negotiable. The app server never receives the video bytes. Its job is to authorize and sign, then react to completion. Everything else is storage, queue, and workers.
Step 1 — request an upload session. POST /uploads with filename, content-type, and size. The server validates the request, creates a metadata row in status uploading, initiates an S3 multipart upload, and returns the upload_id plus per-part signed URLs — each constrained to method PUT, the exact key, content-type, and a content-length range, expiring in ~15 minutes.
Step 2 — upload parts directly to S3. The client splits the file into parts (e.g. 16 MB each), uploads them in parallel, and retries only parts that fail. A dropped connection costs one part. When all parts succeed, the client calls CompleteMultipartUpload with the ordered (part, ETag) list.
A large file is split into N parts, each uploaded in parallel with its own signed URL; only failed parts retry, then a single Complete call assembles them.
Step 3 — completion event → async pipeline. S3 emits an ObjectCreated event to SQS. An idempotent worker (keyed by upload_id) consumes it. The object is in an untrusted quarantine bucket at this point.
Upload completion fans out through a queue to idempotent workers — scan, transcode, thumbnail, extract — each updating status; failures go to a DLQ.
Step 4 — scan, then process. The worker sniffs the real MIME from magic bytes, runs an AV/policy scan, and — on pass — transcodes the source into an HLS bitrate ladder, generates thumbnails, and extracts metadata. On fail, it deletes the object and marks the upload failed.
Step 5 — promote and flip status. Passing, processed assets are promoted to the trusted bucket (served from a separate origin). The worker flips the status uploading → uploaded → processing → ready. The client tracks this over SSE and shows progress; lower resolutions can be published first so the video is watchable while higher renditions finish.
A completed upload is not a ready asset — the processor owns the transition. Clients poll or subscribe to this state.
Step 6 — deliver. Playback is served via signed CDN URLs from the private trusted origin, using range requests / adaptive-bitrate segments so players seek and stream without downloading the whole file.
Downloads are served from a CDN with signed URLs / cookies — private by default, cached at the edge, never through the app.
Step 7 — operate and control cost. Lifecycle rules abort incomplete multipart uploads after 7 days and age old renditions Standard → IA → Glacier. Identical source files are deduplicated by content hash. Processing failures route to a DLQ for inspection.
Result. Bytes never touch the fleet; uploads are resilient and resumable; content is scanned before it's ever served; the user gets a live status; delivery is edge-cached and private; and storage cost is bounded by lifecycle and dedup. Every cardinal sin — bytes through the app, blocking on transcode, trusting the MIME type, public buckets, phantom multipart cost — is designed out.
Interview playbook
When it comes up
- Files are large enough to threaten app bandwidth, memory, or request timeouts
- Uploads need resume, progress, scanning, processing, or CDN delivery
- The prompt mentions videos, images, documents, attachments, or backups
- A media platform, file sync, or UGC system is in scope
Order of reveal
- 1Move bytes off the app. The app signs a constrained, short-lived URL; the client uploads direct to blob storage — the fleet never touches the bytes.
- 2Choose the upload mode. Single PUT under ~100 MB, multipart/resumable above it for parallel parts and per-part retries.
- 3Process asynchronously. Completion emits an event to a queue; idempotent workers scan, transcode, and thumbnail while the user sees a processing status.
- 4Define the trust boundary. Untrusted quarantine bucket → MIME sniff + AV scan → promote to a trusted bucket on a separate origin.
- 5Model status explicitly. uploading → uploaded → processing → ready/failed; a completed upload is not a ready asset.
- 6Deliver and control cost. Private buckets, signed CDN delivery with range requests; lifecycle tiering, multipart-abort, and dedup for cost.
Signature phrases
- “The app signs intent; storage receives bytes” — Captures the core control-plane / data-plane split.
- “Untrusted until scanned” — Signals security maturity and the two-bucket boundary.
- “A completed upload is not a ready asset” — Names the async status contract the processor owns.
- “The declared MIME type is a claim, not a fact” — Identifies the stored-XSS / disguised-content risk.
- “Abandoned multipart uploads are a bill” — Shows operational cost hygiene most candidates miss.
- “Downloads bypass the app too” — Extends the bypass principle to the delivery path via CDN.
Likely follow-ups
?“What if the client loses network halfway through a large upload?”Reveal
Multipart upload makes this cheap: the client tracks which parts succeeded and re-requests signed URLs only for the missing parts, so a 5 GB upload that drops at 90% retries one part, not the whole file. For cross-session resume it persists the upload_id and the completed-part list (or queries ListParts). Abandoned sessions are reclaimed by a lifecycle rule that aborts incomplete multipart uploads after a few days so the orphaned parts stop costing money.
?“What if the processing worker runs twice on the same upload?”Reveal
The completion event is at-least-once and workers retry, so processing must be idempotent. I key the work by upload_id plus the rendition/output key; if the output already exists with a matching checksum, the worker marks the status ready instead of re-transcoding. DB updates are upserts on the same key. Repeated genuine failures go to a DLQ rather than retrying forever.
?“How do you stop someone uploading a malicious file?”Reveal
Defence in depth. The signed upload URL constrains method, exact key, content-type, and content-length, so a leaked URL can only overwrite one key with matching-type, matching-size content briefly. After upload, the file sits in an untrusted quarantine bucket where I sniff the real MIME from magic bytes (ignoring the declared type), run an AV/policy scan, and for images re-encode to strip embedded payloads. Only passing files are promoted to the trusted bucket, which is served from a separate origin so even a slip can't execute in the app's security context.
?“How do you keep storage cost under control at scale?”Reveal
Three levers. Lifecycle rules age objects Standard → Infrequent Access → Glacier/Archive based on access pattern, which can cut storage spend ~10× for archive-heavy data, plus an abort-incomplete-multipart rule to kill phantom parts. Content-addressed deduplication stores identical files once with reference counts and lets the client skip re-uploading parts whose hash already exists. And serving through a CDN offloads egress bandwidth from origin. I'd only age objects to cold tiers whose access truly justifies the retrieval latency and fetch fee.
?“How does the user see progress while processing takes minutes?”Reveal
An explicit status model — uploading → uploaded → processing → ready/failed — exposed by the API separately from the result URLs. The client subscribes over SSE/websocket (or polls) for live updates. For long transcodes I publish lower resolutions first so the asset is usable while higher renditions finish, and the processor owns every transition so the upload endpoint can return immediately with just a handle.
Canonical examples
- →YouTube / Vimeo video upload
- →Profile-photo and avatar upload
- →Dropbox / Google Drive file sync
- →GitHub LFS / release-asset upload
- →Attachment upload in chat and email apps
Variants
Single signed-PUT upload
One pre-signed PUT straight to blob storage — the simplest correct upload.
The client uploads bytes straight to S3/GCS via a short-lived signed URL; the app only issues the URL and processes the result asynchronously.
For files below the multipart cutoff (~100 MB), a single pre-signed PUT is the right tool — simpler than multipart with no assembly step. The app issues a URL constrained to method PUT, the exact object key, content-type, and a content-length range, with a 15-minute expiry; the client PUTs the bytes directly; blob storage emits a completion event.
This covers the vast majority of consumer uploads — profile photos, document attachments, short audio. The only thing it gives up is resumability: a dropped connection means re-uploading the whole file, which is acceptable for small files but not for large ones. Keep the size cap in the signed URL, not just in client validation, so a tampered client can't push a 100 GB object into your bucket.
Pros
- +Simplest flow — one URL, one PUT, one event
- +No multipart assembly or part-tracking state
- +Bytes still bypass the app entirely
Cons
- −No resumability — a dropped connection restarts the whole upload
- −No parallelism, so large files are slow on lossy networks
Choose this variant when
- Files comfortably under ~100 MB
- Networks are reliable (server-to-server, good connectivity)
- You want the least moving parts
Multipart / resumable upload
Split into parts, upload in parallel, retry only failed parts, assemble with one Complete call.
A large file is split into N parts, each uploaded in parallel with its own signed URL; only failed parts retry, then a single Complete call assembles them.
The standard for large files. The client initiates a multipart upload to get an upload_id, splits the file into parts (S3 minimum 5 MB except the last), requests a signed URL per part, and uploads them in parallel. Each part returns an ETag; when all are done the client calls CompleteMultipartUpload with the list of part numbers and ETags, and storage stitches them into one object.
The payoff is resilience and speed: parallel parts saturate bandwidth, and a network blip retries a single 5 MB part instead of restarting a 5 GB upload. The cost is operational hygiene — abandoned uploads leave billed parts behind, so you must set a lifecycle rule to abort incomplete multipart uploads after a few days. Resumable protocols like tus.io and GCS resumable uploads wrap the same mechanics with progress tracking and a resume token.
Pros
- +Resumable — only failed parts retry, not the whole file
- +Parallel parts maximise throughput on big files
- +Progress reporting falls out naturally per part
Cons
- −More client complexity (part tracking, Complete call)
- −Abandoned uploads accrue cost without lifecycle cleanup
Choose this variant when
- Files above ~100 MB or unreliable mobile networks
- Users need upload progress and resume
- Video, backups, large datasets
Two-bucket trust boundary
Quarantine bucket → MIME sniff + AV scan → promote to a trusted, CDN-served bucket.
Uploads land in an untrusted quarantine bucket; only after MIME sniffing and an AV scan are they promoted to the trusted, CDN-served bucket.
Any system accepting user content needs a trust boundary, because the client's declared content-type and extension are claims an attacker controls. Uploads land in an untrusted bucket that is never served publicly. A completion event triggers a scanner that sniffs the real MIME / magic bytes, runs an AV/policy check, and enforces content rules. Only files that pass are promoted (copied or moved) to a trusted bucket served from a different origin via CDN; failures are deleted and the upload marked failed.
This defeats the stored-XSS attack (HTML disguised as an image served from your origin), keeps malware out of the served path, and gives you a clean place to enforce per-file policy. It composes with both single-PUT and multipart — the scan runs on the completion event regardless of how the bytes arrived.
Pros
- +Stops disguised-content and stored-XSS attacks
- +Malware never reaches the served path
- +Clear policy-enforcement and audit point
Cons
- −Extra copy/promote step and a second bucket to manage
- −A scanning delay before the asset becomes available
Choose this variant when
- Any public or multi-tenant user-generated content
- Files are served back to other users
- Compliance / safety requires content scanning
tus.io resumable protocol
An open, HTTP-based resumable protocol with a resume token and offset-based continuation.
A completed upload is not a ready asset — the processor owns the transition. Clients poll or subscribe to this state.
When you control both client and server and want resumability without coupling to a specific cloud's multipart API, the tus open protocol is a clean choice. The client creates an upload (getting a URL + total length), then PATCHes data at increasing byte offsets; if the connection drops, it HEADs the URL to learn the server's current offset and resumes from there. The resume token is just the upload URL plus the offset, so a paused upload can continue hours later or even on a different network.
tus shines for desktop sync clients and long uploads over poor connectivity. The trade-off versus native S3 multipart is that you operate a tus server (or use a managed one) that ultimately writes to blob storage, adding a hop — but you gain a portable, well-specified resume contract independent of any cloud provider.
Pros
- +Cloud-agnostic, well-specified resume contract
- +Resume across sessions/networks via offset + token
- +Good fit for desktop sync and very long uploads
Cons
- −You run a tus server in front of blob storage (extra hop)
- −Less "bytes never touch your fleet" than pure signed PUT
Choose this variant when
- You control the client and want provider-portable resume
- Long uploads over flaky networks (desktop sync)
Scaling path
v1 — signed single-PUT direct to blob
Get bytes off the app server with the least machinery.
The app issues a constrained, short-lived pre-signed PUT URL; the client uploads directly to S3/GCS. Even at v1, the bytes never touch your fleet — that's the non-negotiable foundation, not a later optimisation.
The client uploads bytes straight to S3/GCS via a short-lived signed URL; the app only issues the URL and processes the result asynchronously.
Enforce method, key, content-type, content-length, and expiry in the signed URL. This is correct and complete for small files on reliable networks. It falls short the moment files get large or networks get lossy: a dropped connection restarts the whole upload.
What triggers the next iteration
- No resumability — large uploads restart on any network blip
- No parallelism, so big files are slow
- No content validation yet — declared type is trusted
v2 — multipart + async processing
Handle large files reliably and move work off the request path.
Switch large files to multipart so parts upload in parallel and only failures retry. On completion, blob storage emits an event to a queue, and idempotent workers process it — scan, transcode, thumbnail — while the user sees a processing status. Never block an HTTP connection on transcoding.
Upload completion fans out through a queue to idempotent workers — scan, transcode, thumbnail, extract — each updating status; failures go to a DLQ.
Add an explicit status model and a DLQ for poison messages. The new responsibilities: idempotent processing (the event can fire more than once) and cleanup of abandoned multipart uploads.
What triggers the next iteration
- Processing event is at-least-once — workers must be idempotent
- Abandoned multipart parts accrue storage cost
- Long processing needs a status contract for the client
v3 — trust boundary + signed CDN delivery
Make uploads safe to serve and deliver them cheaply.
Insert the two-bucket trust boundary: uploads land in quarantine, get MIME-sniffed and AV-scanned, and are promoted to a trusted bucket only on pass. Serve downloads through a CDN with signed URLs / cookies from a private origin — never through the app, and never public-by-default.
Downloads are served from a CDN with signed URLs / cookies — private by default, cached at the edge, never through the app.
Now content is safe to serve, delivery is edge-cached and cheap, and buckets are private by default. The remaining concern is cost discipline as volume grows.
What triggers the next iteration
- Scan adds latency before an asset is available
- Hot storage cost grows with retained volume
- Promote step doubles writes briefly (copy)
v4 — lifecycle, dedup, and cost control at scale
Keep storage and bandwidth costs sane as the corpus grows huge.
Add lifecycle rules to age objects from Standard → Infrequent Access → Glacier/Archive and to abort incomplete multipart uploads automatically. Deduplicate identical content by hashing (content-addressed storage), so the same file uploaded by many users is stored once. Tune CDN cache TTLs and use range requests for large media streaming.
Lifecycle rules age objects from hot to archive tiers and abort abandoned multipart uploads, cutting cost by an order of magnitude.
At this stage storage strategy is a first-class cost lever — archive-heavy workloads can cut storage spend ~10× with lifecycle transitions, and dedup plus compression cut it further.
What triggers the next iteration
- Cold-tier retrieval has latency and per-GB fetch cost
- Dedup needs a content hash index and reference counting
- Cache invalidation on re-uploaded/replaced assets
Deep dives
Why the bytes must bypass the app server
The client uploads bytes straight to S3/GCS via a short-lived signed URL; the app only issues the URL and processes the result asynchronously.
This is the load-bearing decision of the whole pattern, and naming all three reasons signals you've actually run an upload system.
The client uploads bytes straight to S3/GCS via a short-lived signed URL; the app only issues the URL and processes the result asynchronously.
Bandwidth. Your app fleet is provisioned for small JSON requests. Stream multi-gigabyte files through it and each instance's network card becomes the bottleneck; you'd scale the fleet for transfer capacity it should never carry. Blob storage is purpose-built for exactly this ingest and is effectively infinitely scalable for it.
Memory. A naive handler buffers the upload in memory and OOMs on large files; even a streaming handler ties up a worker and disk for the entire transfer. Multiply by concurrent uploads and the fleet falls over.
Timeouts and resumability. A large upload over a mobile network can outlast your HTTP request timeout, and a single dropped connection loses the whole transfer. Signed URLs hand the transfer to the storage layer, and multipart gives the client per-part retries and resume — capabilities your app server can't easily offer.
So the app does the cheap part: a millisecond of work to mint a constrained signed URL, and a reaction to the completion event. The storage layer does the expensive part. This separation — the app signs intent; storage receives bytes — is the sentence to lead with.
Multipart mechanics, resumability, and cleanup
A large file is split into N parts, each uploaded in parallel with its own signed URL; only failed parts retry, then a single Complete call assembles them.
Multipart upload is three calls plus N part uploads, and understanding the shape lets you reason about its failure modes.
A large file is split into N parts, each uploaded in parallel with its own signed URL; only failed parts retry, then a single Complete call assembles them.
- 1Initiate → storage returns an
upload_idthat ties the parts together. - 2Upload parts → the client PUTs each part (5 MB+ for S3, except the final part) to a per-part signed URL, in parallel, and collects the returned ETags.
- 3Complete → the client sends the ordered list of
(part_number, ETag); storage validates and assembles the final object atomically. Until Complete, no object is visible.
Resumability falls out of this: the client tracks which parts succeeded, and on a network failure re-requests signed URLs only for the missing parts. A 5 GB upload that drops at 90% retries one 5 MB part. For cross-session resume you persist the upload_id and the completed-part list (or query storage's ListParts).
The cleanup trap. If the client never calls Complete (closed the tab, crashed), the uploaded parts sit in storage billed indefinitely — invisible because there's no final object, but very much on the invoice. The fix is mandatory: a lifecycle rule to abort incomplete multipart uploads after a few days. "Abandoned multipart uploads are a bill" is a phrase that shows operational maturity. Choose part size to balance parallelism (more parts = more concurrency and finer retry granularity) against overhead (each part is a request); a few tens of MB per part is a common sweet spot for large media.
The trust boundary matters more than the upload API
Uploads land in an untrusted quarantine bucket; only after MIME sniffing and an AV scan are they promoted to the trusted, CDN-served bucket.
Most candidates spend their time on the upload mechanics and skip the security model — which is backwards, because the trust boundary is where real systems get breached.
Uploads land in an untrusted quarantine bucket; only after MIME sniffing and an AV scan are they promoted to the trusted, CDN-served bucket.
The client's content-type and extension are claims, not facts. An attacker can upload an HTML file labelled image/png. If you store it and later serve it from your app's origin with the attacker-chosen type, you have stored XSS — the file runs in your users' browsers in your origin's security context. The same applies to malware, oversized files, and content-policy violations.
The defence is a two-bucket boundary:
- 1Uploads land in an untrusted bucket that is never publicly served.
- 2The completion event triggers a scanner that sniffs the real MIME from magic bytes (ignoring the declared type), runs AV / policy scanning, and enforces size and content rules.
- 3Only files that pass are promoted to a trusted bucket served from a separate origin (so even a slip can't execute in your app's origin), via CDN with signed delivery.
- 4Failures are deleted and the upload marked
failed.
The signed upload URL does its share too — constraining method, key, content-type, and content-length shrinks the blast radius of a leaked URL to "overwrite one key with matching-type, matching-size content within 15 minutes." Defence in depth: constrain the URL and quarantine-then-scan.
Async processing and the status contract
Upload completion fans out through a queue to idempotent workers — scan, transcode, thumbnail, extract — each updating status; failures go to a DLQ.
A completed upload is not a ready asset. Scanning and transcoding take seconds to minutes, and you must never hold an HTTP connection open for them. So processing is async, and the user experience is the status model.
A completed upload is not a ready asset — the processor owns the transition. Clients poll or subscribe to this state.
The upload-complete event flows through a queue to idempotent workers — scan, transcode to HLS/DASH variants, generate thumbnails, extract metadata — each advancing an explicit status: uploading → uploaded → processing → ready (or failed/canceled). The API exposes this status (and the result URLs) separately, and the client either polls it or subscribes via SSE/websocket for a live progress indicator.
Idempotency is mandatory because the completion event is delivered at-least-once and workers can be retried: key the work by upload_id + rendition, and if the output already exists with a matching checksum, mark ready instead of re-transcoding. Repeated failures route to a DLQ for inspection rather than retrying forever. The mental shift: the upload endpoint returns fast with a handle; the processor owns the processing → ready transition; the client tracks state. "A completed upload is not a ready asset; the processor owns that transition" is the framing.
Delivery: private by default, signed, CDN-cached
Downloads are served from a CDN with signed URLs / cookies — private by default, cached at the edge, never through the app.
Upload is only half the system — serving the bytes back has its own pitfalls, and the defaults are dangerous.
Downloads are served from a CDN with signed URLs / cookies — private by default, cached at the edge, never through the app.
Never public-by-default. A bucket with a public-read ACL means anyone who guesses or obtains a URL can read every object. Buckets should be private; access is granted per request via a signed download URL (or signed cookies for a session) that the app mints after an authorization check. This keeps the access decision in your code, not in a bucket ACL.
Serve through a CDN, not the app. Just as uploads bypass the app, downloads should too. The app issues a short-lived signed URL pointing at the CDN; the CDN serves from cache, fetching from the private origin bucket only on a miss. This gives edge latency, offloads bandwidth, and — for large media — supports range requests so players can seek and stream without downloading the whole file.
Large-media specifics. Video is delivered as adaptive-bitrate segments (HLS/DASH) produced during transcoding, so the player fetches small cached segments at a bitrate matched to the viewer's bandwidth. The signed URL protects the manifest; the segments inherit the protection. The throughline with the rest of the pattern: the app stays on the control path (authorize, sign, react), and the storage + CDN layer owns the data path in both directions.
Storage classes, lifecycle, and dedup as cost levers
Lifecycle rules age objects from hot to archive tiers and abort abandoned multipart uploads, cutting cost by an order of magnitude.
At scale, blob storage cost is dominated by what you keep and how — and lifecycle policy is the lever most candidates never mention.
Lifecycle rules age objects from hot to archive tiers and abort abandoned multipart uploads, cutting cost by an order of magnitude.
Storage classes trade retrieval latency/cost for storage price. A typical policy: Standard for the first 30 days (frequent access), Infrequent Access / Nearline for 30–90 days, Glacier / Archive beyond 90 days for objects rarely read. Lifecycle rules automate the transitions, and for archive-heavy workloads (backups, old media) this can cut storage spend by roughly an order of magnitude. The catch: cold tiers have retrieval latency (minutes to hours) and a per-GB fetch fee, so only age objects whose access pattern truly justifies it.
Cleanup as cost control. The abort-incomplete-multipart rule (covered earlier) prevents phantom part storage. Expiring temporary derivatives and old versions keeps the bucket from growing without bound.
Deduplication. Hashing object content (content-addressed storage) means the same file uploaded by many users — a viral video, a common attachment — is stored once with reference counts, not N times. Combined with compression for compressible types, dedup can dramatically cut both storage and the bandwidth of re-uploads (the client can skip uploading a part whose hash already exists). The summary: treat storage tier, lifecycle, and dedup as explicit design choices, not defaults — they're often the largest line item.
Decision levers
Single PUT vs multipart
Cutoff around 100 MB. Below: single signed PUT is simpler. Above (or on lossy mobile networks): multipart wins — parallel parts, part-level retries, resume after a blip. Always cap content-length in the signed URL regardless.
Processing: sync vs async
Always async. Upload completion emits an event; idempotent workers process it (scan, transcode, thumbnail) while the user sees a processing status. Never block an HTTP connection on transcoding; route poison messages to a DLQ.
Trust boundary placement
Validate content-type and size in the signed URL (server-enforced). After upload, sniff real MIME / magic bytes and AV-scan in an untrusted bucket; promote only passing files to a trusted bucket served from a separate origin.
Delivery path
Private buckets by default; serve via signed download URLs / cookies through a CDN from a private origin. Use range requests and adaptive-bitrate segments for large media. Downloads bypass the app just like uploads.
Storage class strategy
Standard for hot (0–30d), Infrequent Access/Nearline (30–90d), Glacier/Archive (90d+). Lifecycle rules automate transitions for ~10× archive savings, plus abort-incomplete-multipart cleanup. Cold tiers add retrieval latency and fetch cost.
Deduplication
Content-address by hash so identical files are stored once with reference counts; skip uploading parts whose hash already exists (delta sync). Combine with compression. The largest cost saving for high-overlap corpora.
Failure modes
The fleet becomes the bandwidth bottleneck, instances OOM on big files, and slow uploads eat the request timeout. Always issue a signed URL and upload direct to blob storage.
A tampered client uploads a 100 GB object to your bucket. Enforce content-length in the signed URL, set bucket quotas, and alert on unusual growth.
A default-public bucket ACL means anyone with a URL reads any object. Default to private; serve via signed download URLs or a CDN with signed cookies.
A client labels an HTML file image/png; served from your origin it becomes stored XSS. Sniff real MIME / magic bytes server-side and serve from a separate origin.
Blocking the HTTP connection on transcoding ties up workers and times out. Process async via blob event → queue → idempotent worker, with a status model.
Abandoned uploads leave parts billed forever, invisible because there is no final object. Add a lifecycle rule to abort incomplete multipart uploads after a few days.
At-least-once completion events plus worker retries cause duplicate transcodes or double DB writes. Key work by upload_id + rendition and short-circuit if the output already exists.
Case studies
Dropbox
Dropbox — content-addressed blocks, dedup, and resumable sync
Dropbox's storage system splits every file into fixed-size blocks (historically 4 MB) and addresses each block by the hash of its contents. This content-addressing yields two big wins that this pattern highlights as cost levers. First, deduplication: if a block's hash already exists in storage, it isn't uploaded or stored again — so a file shared across many users, or an unchanged file re-synced, costs almost nothing. Second, delta sync: when a file changes, only the blocks whose hashes changed are uploaded, not the whole file.
The client maintains a manifest of block hashes; on sync it asks the server which blocks are missing and uploads only those, then commits the file as an ordered list of block hashes. This is multipart upload taken to its logical conclusion — parts are content-addressed and globally deduplicated — and it makes uploads inherently resumable: an interrupted sync resumes by re-checking which blocks still need sending.
Dropbox also famously separated the metadata plane (which blocks make up which file, permissions, sync state) from the block storage plane (the bytes), so the control path and data path scale independently — exactly the "app signs intent; storage holds bytes" separation. The takeaway: content-addressed blocks turn dedup, delta sync, and resumability into one mechanism.
YouTube
YouTube — resumable upload, async transcode ladder, and status
YouTube's upload uses Google's resumable upload protocol: the client starts a session and gets a session URI, then uploads bytes in chunks; if the connection drops, it queries the session for the current offset and resumes — no restart. This is the same offset-based resume contract as tus, applied at planet scale, and it's why a creator on a flaky connection can upload a multi-gigabyte 4K video reliably.
Once bytes land, the work is overwhelmingly asynchronous. A single source file is transcoded into a ladder of resolutions and codecs (from low-bitrate mobile up to 4K/8K, across multiple codecs) so playback can adapt to each viewer's device and bandwidth via adaptive-bitrate streaming. Transcoding a long, high-resolution video takes minutes, so the upload API returns immediately and the creator watches an explicit status progress through processing — exactly the "a completed upload is not a ready asset" contract. Lower resolutions are typically published first so the video is watchable while higher-quality renditions finish.
Delivery is then segment-based adaptive streaming from a global CDN, with the heavy transcoded variants cached at the edge. The architecture is the textbook large-blob pipeline: resumable direct ingest → async transcode fan-out → status-driven UX → CDN segment delivery — just at extraordinary scale.
Imgur / UGC platforms
User-generated image hosts — the quarantine-and-scan trust boundary
Image- and file-hosting platforms that accept anonymous or low-friction uploads live or die by their trust boundary, because they serve user content back to other users at massive scale — the ideal conditions for content-based attacks. The hard-won lesson across this category is that the client's declared content-type and extension cannot be trusted, and content must be validated before it is ever served.
The standard architecture matches this pattern's two-bucket model: uploads land in a quarantine bucket that is never on the served path; a scanning pipeline sniffs the true MIME from magic bytes (rejecting an HTML file masquerading as a PNG), runs malware and content-policy checks, often re-encodes/normalises the image (stripping metadata and any embedded payloads by decoding and re-encoding to a known-good format), and only then promotes the sanitised asset to a trusted bucket served from a separate domain via CDN. Serving user content from a different origin than the application is itself a defence — even if a malicious file slips through, it can't execute in the app's security context, neutralising stored-XSS.
This category also leans hard on CDN delivery with signed/cached access and aggressive deduplication (the same meme uploaded a million times is stored once), tying together the trust-boundary and cost themes of the pattern.
Decision table
Blob handling separates upload, processing, trust, and delivery — design each.
| Decision | Default | When to change it | Robust answer includes |
|---|---|---|---|
| Upload path | Signed direct-to-blob URL | Tiny payloads can go through the app | Method/key/type/size/expiry constraints |
| Large upload | Multipart / resumable | Single PUT below ~100 MB | Part retries and abandoned-upload cleanup |
| Processing | Async event → queue → worker | Inline only for trivial validation | Status model, idempotency, retries, DLQ |
| Trust boundary | Untrusted → scan → trusted bucket | Private internal files may skip it | MIME sniffing, AV scan, separate origin |
| Delivery | Private bucket + signed CDN URL | Truly public assets can be cached open | Signed URLs/cookies, range requests |
| Cost | Lifecycle tiering + dedup | Small/short-lived corpora may not need it | Class transitions, multipart abort, hashing |
- Enforce content-length and content-type in the signed URL, not just in the client.
- A completed upload is not a ready asset — the processor owns the transition to "ready".
Drills
Why signed URLs instead of streaming through the app?Reveal
Three reasons. Bandwidth — the app fleet would become the transfer bottleneck and cost centre for traffic it should never carry. Memory — large uploads buffer to RAM and OOM, or tie up a worker and disk for the whole transfer. Timeout/resumability — a big upload over a mobile network can outlast the HTTP timeout, and a dropped connection loses everything; signed URLs hand the transfer to storage and multipart gives per-part retries. The app spends a millisecond minting a constrained URL; storage handles the ingest.
An attacker has a leaked signed upload URL. What's the damage?Reveal
It depends entirely on the constraints. If the URL is scoped to PUT + the exact key + content-type + content-length + a 15-minute expiry, the worst they can do is overwrite that one key with content of matching type and size within 15 minutes — blast radius of one object. Without those constraints (just "upload to the bucket"), they can upload anything, anywhere in the bucket, for the URL's lifetime — polluting it with arbitrary, possibly malicious content. Constrain the URL to shrink the blast radius.
Why is processing always asynchronous?Reveal
Scanning and transcoding take seconds to minutes; holding an HTTP connection open for them ties up a worker, risks a timeout, and couples upload latency to processing. Instead the upload completes fast, blob storage emits an event to a queue, and idempotent workers process it while the user watches a status indicator. The upload endpoint returns a handle; the processor owns the transition to ready.
Why a two-bucket trust boundary, and what does the scanner check?Reveal
Because the client's content-type and extension are attacker-controlled claims — an HTML file labelled image/png, served from your origin, is stored XSS. Uploads land in an untrusted bucket that is never served; a scanner sniffs the real MIME from magic bytes (ignoring the declared type), runs AV/policy checks and size limits, often re-encodes images to strip embedded payloads, and only promotes passing files to a trusted bucket served from a separate origin. Even a slip-through can't execute in the app's security context.
You forgot one lifecycle rule and your storage bill is mysteriously high. Which rule?Reveal
The abort-incomplete-multipart rule. When a client initiates a multipart upload and never calls Complete (closed the tab, crashed), the uploaded parts persist in storage billed indefinitely — and they're invisible because there's no final object to see in a normal listing. A lifecycle rule that aborts incomplete multipart uploads after a few days reclaims them. "Abandoned multipart uploads are a bill."
How do you make the processing worker safe to retry?Reveal
Idempotency keyed by upload_id + rendition. The completion event is at-least-once and workers retry on failure, so the worker first checks whether the output already exists with a matching checksum; if so it marks ready and returns instead of re-transcoding. DB writes are upserts on the same key, not blind inserts. Genuine repeated failures route to a DLQ for inspection rather than looping forever. This makes duplicate deliveries and retries harmless.
When to reach for this