File Thunder — Architecture & Design Guide V2

NexGate Media Engine | Version 2.0 
 
 v2 integrates the media-security refinements worked out in design review.
Every change from v1 is tagged inline as [v2 FIX] so it can be reviewed individually.
Section 0 lists all of them in one place — read that first. 
 
 
 0. What changed in v2 (read this first) 
 These are the careful corrections. Each is explained in full in its section. 
 
 Signing is three tiers, not two. "Public = unsigned, private = signed" was wrong. Correct model: trivial public assets unsigned; public content (full images/videos) signed even though public ; private/paid signed + auth. → §8 
 Public content gets signed too. A signed URL is anti-scrape / anti-hotlink, not only access control. TikTok signs even public videos. → §8, §14 
 The visibility-bypass hole must be closed. In v1, media URLs were permanent and unsigned, so a PRIVATE post's bytes were reachable by anyone with the link — the post was hidden by the API but the file was not. Fixed by locking the bucket (no public-read) and routing everything through signed CDN URLs. → §14 
 Backend signs, CDN validates, they never talk at sign time. Signing is pure math on (path + expiry + secret) . No file, no CDN contact. CDN fetches from MinIO lazily on first view. → §9 
 Domain is media.nexgate.co , not .com . v1 mixed both. One domain everywhere. → §10 
 Object keys must be owner-correct. v1 had one user's avatar stored under another user's prefix. The owner segment in the key must match the real owner. → §3 
 Store the objectKey , never the signed URL. Signed URLs are minted per request and thrown away. Never persisted in the DB or feed cache. → §13, §22 
 MP4 and HLS use different token strategies. Short MP4 = one signed URL. Long HLS = dual-token (short manifest + long segments) or prefix-signing, so playback never dies mid-video. → §11 
 Single-use is for downloads only, never streams. A stream is many requests; single-use would break playback. Streams use short expiry + user-binding instead. → §11, §12 
 Watermarked file is NEVER in the feed response. Feed = clean playback URLs only. Watermarked download comes from a separate endpoint, minted on tap. → §12, §13 
 Cache gotcha: never cache a signed object keyed by its signature. A cached object can outlive its stamp; private/paid content must bypass shared cache. → §10, §11 
 Bot gate is the missing half of anti-scraping. Signed URLs stop permanent theft; the bot gate stops the act of mass-pulling. v1 had no bot gate section. → §14 
 Open decision parked: NextGate's moat (content vs commerce) drives how hard to stamp public media and how heavy the bot gate needs to be. → §30 
 
 
 1. What is File Thunder? 
 File Thunder is NexGate's dedicated media processing engine — a standalone Spring Boot
microservice responsible for all file and media operations across the platform. 
 Core principle: The Main Backend never touches raw files. It delegates all media
operations to File Thunder and acts as a thin client. 
 Responsibilities: receive uploads, validate (type/size/quota), virus scan, transcode,
generate variants (thumbnails, WebP, HLS, MP4), watermark, store in MinIO, serve via CDN,
mint signed delivery URLs, publish media-ready events. 
 Not: AI/ML, recommendation, social graph, payments, auth, business logic. 
 
 2. The Four Wheels 
 FILE THUNDER
 ┌───────────┬───────────┬───────────┬───────────┐
 │ FFmpeg │ ImageMagic│ ClamAV │ MinIO │
 │ (video) │ (images) │ (security)│ (storage) │
 └───────────┴───────────┴───────────┴───────────┘
 transcode resize + virus object
 HLS, w/mark WebP, blur scan store + CDN
 
 
 FFmpeg (Jaffree) — transcode, HLS segmenting, thumbnail/best-frame, 3s preview, watermark overlay, blur-pad, fMP4, format conversion, audio extract/normalize. 
 ImageMagick (IM4Java) — resize variants, format→WebP, EXIF strip (privacy), auto-orient, dominant color, LQIP, OG image, GIF→WebP, PDF→preview. Plus blurhash-java . 
 ClamAV — virus scan every upload (TCP clamav:3310 ), hash-cache, digital products scanned twice. 
 MinIO — S3-compatible storage, presigned direct uploads, origin for Cloudflare, multipart + byte-range, lifecycle policies. Never publicly exposed (only Cloudflare + File Thunder reach it). 
 
 
 3. Storage Architecture (MinIO Buckets) 
 Rule: 4 buckets total. Never one bucket per user. 
 nexgate-raw/ ← temporary upload landing (auto-delete 24h)
nexgate-public/ ← social content (avatars, posts, stories, events, shops)
nexgate-private/ ← DMs, documents, KYC, temp audio
nexgate-digital/ ← purchase-gated products
 
 Object Key Pattern 
 {bucket}/{domain}/{ownerId}/{entityId}/{fileId}/{variant}

nexgate-public/posts/usr_123/post_456/file_789/720p_clean.mp4
nexgate-private/messages/conv_abc/file_789/original.jpg
nexgate-digital/products/shop_xyz/prod_123/file_789/original/file.pdf
 
 
 [v2 FIX] Owner-correct keys. The {ownerId} segment MUST be the real owner of the
file. v1 had cases where one user's avatar lived under a different user's prefix — that
leaks identity and misattributes ownership. The owner in the key = the owner in the DB.
Keys are otherwise opaque (no usernames, no PII). 
 
 nexgate-public layout (unchanged from v1) 
 profiles/{userId}/{fileId}/ avatar_400.webp, avatar_150.webp, avatar_50.webp, cover.webp
posts/{userId}/{fileId}/ {360,720,1080}p_clean.mp4, {360,720,1080}p_watermarked.mp4,
 preview_3s.mp4, hls/master.m3u8, hls/{360,720,1080}p/seg_*.ts,
 thumbnail.webp, og_clean.webp, og_play.webp, og_preview.mp4
stories/{userId}/{fileId}/ 720p_clean.mp4, thumbnail.webp
events/{accountId}/{eventId}/{fileId}/ banner.webp, banner_mobile.webp, banner_thumb.webp
shops/{shopId}/{productId}/{fileId}/ large.webp, medium.webp, thumb.webp
 
 (See §12 for the watermarked-variant storage decision — store one rung, not all three.) 
 nexgate-private / nexgate-digital 
 As v1: messages, documents, kyc, temp audio (private); products with original/ (never
exposed) + preview/ + cover/ (digital). 
 
 4. Accepted File Formats 
 
 Images accept JPEG/PNG/HEIC/WebP/GIF; reject BMP/TIFF/RAW/SVG (SVG = security risk). Output always WebP . 
 Videos accept MP4/MOV/MKV/WEBM/AVI/3GP; reject WMV/FLV/VOB. Output H.264 MP4 (short) or HLS H.264 (long). 
 Digital PDF, DOCX/XLSX/PPTX, GLB/OBJ/FBX/STL, MP3/WAV/FLAC, ZIP/RAR, PSD/AI, PNG. 
 
 
 5. Upload Flow 
 [1] Client-side intelligent compression (max 1080p, CRF 18 light, HEIC→JPEG)
[2] POST /media/upload-request { fileName, fileSize, mimeType, directory, clientMeta }
[3] File Thunder: validate MIME + size + quota (atomic SQL) → DB record PENDING
 → presigned MinIO URL (nexgate-raw) → { fileId, uploadUrl, expiresIn: 1800 }
[4] Client uploads directly to MinIO (TUS resumable / multipart >5MB)
[5] Client confirms POST /media/confirm { fileId } (cleanup job recovers if missed)
[6] Processing pipeline starts
 
 client compress
 │
 ▼
 POST upload-request ──▶ validate + quota ──▶ presigned URL
 │
 ▼
 upload direct to MinIO (raw)
 │
 ▼
 confirm ──▶ ClamAV ──▶ dedup ──▶ process ──▶ READY
 │
 ▼
 RabbitMQ media.ready ──▶ Main Backend
 
 Quota checked at presigned-URL time (atomic UPDATE ... WHERE used+size <= quota ).
 Duration checked AFTER upload via FFprobe (client value is a hint, not trusted) — over
limit → delete from raw, release quota, 403 DURATION_EXCEEDED . 
 Per-Upload Limits 
 
 
 
 Plan 
 Image 
 Short Video 
 Long Video 
 Digital Video 
 Duration 
 
 
 
 
 FREE 
 20MB 
 200MB 
 ❌ 
 ❌ 
 5 min 
 
 
 PRO 
 20MB 
 500MB 
 2GB 
 5GB 
 60 min 
 
 
 BUSINESS 
 20MB 
 2GB 
 5GB 
 20GB 
 Unlimited 
 
 
 
 
 6. Processing Pipelines (summary) 
 All pipelines: ClamAV scan → SHA-256 dedup check → process → store → delete raw →
DB READY → publish events. Watermarking and all placeholders are generated here, at
processing time , never at serve time. 
 6.1 Image 
 Auto-orient → strip EXIF (privacy) → NSFW check → dominant color + LQIP + BlurHash →
size variants per content type (Post: 1600/800/300; Profile: 400/150/50; Product: 1000/500/200
square; Event: 1200×630/800×420) → WebP → OG image → store public → READY. 
 6.2 Short Video (MP4, duration < 3 min) 
 FFprobe → adaptive transcode decision (never upscale, never inflate) → fast lane (quick
360p → LIVE_PARTIAL ) → full processing: 
 
 Clean variants {360,720,1080}p_clean.mp4 (faststart). Non-9:16 reel-eligible → blur-pad to 9:16 (scale-fill+blur background, scale-fit sharp foreground, overlay). 
 Watermarked variant(s) — moving watermark (jumps corners every 3s; 80×80 logo 60% opacity + username; app-based label from clientApp ). [v2 FIX] store one download rung (720p), not all three (§12). 
 Extras: thumbnail.webp (best-frame scored) + its LQIP/BlurHash/dominantColor, preview_3s.mp4 (360p muted watermarked), og_clean.webp , og_play.webp , og_preview.mp4 . 
 fMP4 ( frag_keyframe+empty_moov+faststart ) for byte-range. 
 Extract audio.wav → nexgate-private (Rec Engine). 
 READY: isReelEligible (<3min), streamingFormat: MP4 , variants JSONB, aspectRatio, qualityScore, duration. 
 
 6.3 Long Video (HLS, duration ≥ 3 min) 
 Same as short EXCEPT: HLS adaptive (no fast lane — stays PROCESSING until done).
 hls/{360,720,1080}p/ segments + master.m3u8 . Segment = 2s, H.264 + AAC, MPEG-TS.
Still makes preview_3s, thumbnail, placeholders, OG. isReelEligible: false , streamingFormat: HLS . 
 6.4 Digital Products 
 Double ClamAV scan → SHA-256 checksum → light preview by type (PDF p1 watermarked,
video first 2min 480p watermarked, audio 30s 96kbps, 3D multi-angle thumbs, image 600px
watermarked) → store original/ (never exposed) + preview/ + cover/ → READY. 
 6.5 DM Attachments 
 nexgate-private, no CDN , conversation-membership check, thumb+original only, no
watermark, no reel pool, signed 5-min URLs. Virus scanned like everything. 
 
 7. Watermarking Strategy 
 
 Applied at processing time, once. Stored as separate _watermarked variant. Never at serve time, never client-side. 
 Moving watermark (corners every 3s) — static is croppable, moving is not. 
 Served: in-app playback → clean variant; download button → watermarked variant (§12). Direct CDN URL → clean (unavoidable; accepted, same tradeoff every platform has). 
 Pre-watermarked uploads (from other platforms): Phase 1 = no detection, just overlay ours. Phase 3 = AI logo detection → suppress reach (Instagram model), never block. 
 Username change: old videos keep old username (like TikTok). 
 
 
 [v2 FIX] Honest scope. A watermark does NOT prevent download — it survives download.
Its job is attribution + traceability, not prevention. The clean file is reachable by a
determined ripper via the playback URL; that's accepted. Goal = "not worth the effort for
99% + traceable if they do." 
 
 
 8. The Three Delivery Tiers [v2 FIX — core correction] 
 v1 assumed two levels (public unsigned / private signed). Correct model is three : 
 
 
 
 Tier 
 Content 
 Login to view? 
 URL signed? 
 Expiry 
 CDN cache 
 
 
 
 
 1 — trivial public 
 avatars, thumbnails, posters, blur previews 
 no 
 no 
 n/a 
 aggressive, 1y 
 
 
 2 — public content 
 full post images, video rungs 
 no 
 YES 
 hours 
 careful (§10) 
 
 
 3 — private / paid 
 DM media, private posts, digital products, KYC 
 yes 
 YES 
 minutes 
 none / private 
 
 
 
 ┌──────────────────────────────────────────────┐
 │ TIER 1 trivial public │
 │ avatars, thumbs → unsigned, cache 1y │
 ├──────────────────────────────────────────────┤
 │ TIER 2 public content │
 │ full images/video → SIGNED, hrs TTL │
 ├──────────────────────────────────────────────┤
 │ TIER 3 private / paid │
 │ DM, private, digital → SIGNED + auth + x-uid │
 └──────────────────────────────────────────────┘
 only diff T2 vs T3 = auth check before minting
 
 The key insight: "public to view" and "signed URL" are independent. Tier 2 content is
open to watch by anyone (no login), but the file is still signed — because signing is
 anti-scrape / anti-hotlink , not access control. This is exactly TikTok's model (public
video, yet the raw file link expires and 403s). 
 The only difference between tier 2 and tier 3 in code is whether an auth/visibility check
runs before minting — the signing itself is identical. 
 
 Tier 1 is the only unsigned tier. Anything a scraper would actually want (full media)
is tier 2+, i.e. signed. 
 
 
 9. Signed URLs — how they actually work [v2 — new, the mechanism] 
 A signed URL = a normal link with a stamp made from three things: 
 stamp (x-sig) = HMAC(secret, path + expiry)
 
 
 path — the object key being requested 
 expiry ( x-expires ) — unix time the link dies 
 secret — a key known ONLY to File Thunder/backend AND the CDN. Never in the URL, browser, or JSON. Lives in Vault. 
 
 Who does what, and when — three separate moments: 
 
 Upload time: File Thunder makes the variants, stores them in MinIO. CDN not involved. 
 Feed-build time (signing): backend computes x-sig from (path + expiry + secret) . Pure string math — no file opened, no CDN contacted, no storage touched. It's just writing a stamped link. (You can sign a path whose bytes don't even exist yet — the math only sees the string.) 
 Playback time (serving): the phone requests the stamped URL → CDN re-computes the stamp with its copy of the secret. Match + not expired → serve. First viewer: CDN lazily fetches the file from MinIO and caches it. Later viewers: served from cache. 
 
 upload feed build playback
 ┌────────┐ store ┌──────────┐ stamped ┌──────────┐
 │ File │ ───────▶ │ Backend │ ─ link ──▶ │ CDN │
 │ Thunder│ to MinIO │ signs │ │ validates│
 └────────┘ └──────────┘ └────┬─────┘
 secret ▲ │ first view
 Vault └──────── same ────────┘ fetch+cache
 secret from MinIO
 
 Tamper resistance (why a stripped/edited link fails): 
 
 No sig → CDN rejects (paths require a valid sig) → 403. 
 Fake sig → recompute doesn't match → 403. 
 Edited expiry → sig was computed from the old expiry, no longer matches → 403. 
 Swapped path → sig bound to original path → 403. 
 Untouched + in time → serve. After x-expires → even the real link 403s. 
 
 Forging anything requires the secret , which never leaves backend + CDN. Protect the
secret (Vault) and the whole scheme holds. 
 You don't write the HMAC. The MinIO SDK (or CDN signer) does it in one call: 
 // ── Mint a signed delivery URL ──
String url = minioClient.getPresignedObjectUrl(
 GetPresignedObjectUrlArgs.builder()
 .method(Method.GET)
 .bucket(bucket)
 .object(objectKey)
 .expiry(seconds, TimeUnit.SECONDS) // → x-expires + x-sig
 .build());
 
 
 10. URL Strategy 
 Tier 1 — trivial public → permanent, unsigned, cache hard 
 https://media.nexgate.co/profiles/usr_123/avatar_400.webp
https://media.nexgate.co/posts/usr_123/vid_789/thumbnail.webp
 
 Cache-Control: public, max-age=31536000 . Scraping these gains nothing. 
 Tier 2 — public content → signed, short-ish expiry 
 https://media.nexgate.co/posts/usr_123/vid_789/720p_clean.mp4?x-expires=1780297200&x-sig=8f3a9c
 
 Tier 3 — private / paid → signed + user-bound + auth check first 
 https://media.nexgate.co/private/messages/conv_abc/file_789/original.jpg
 ?x-expires=...&x-sig=...&x-uid=usr_xyz
 
 
 [v2 FIX] Domain is .co . All URLs use media.nexgate.co . v1 mixed .com / .co —
that produces broken links. One domain everywhere. 
 
 
 [v2 FIX] Cache-key gotcha. Never let the CDN cache a signed object using a cache key
that includes x-sig / x-uid — you'd fragment cache per user, and a cached object can
outlive its stamp (origin checks expiry, but a cache hit never reaches origin). Rules:
tier 1 cache hard; tier 2 cache by path only (strip query from cache key), short edge
TTL; tier 3 never on shared cache. 
 
 Signed URL params 
 
 
 
 Param 
 Purpose 
 
 
 
 
 x-expires 
 link dies after TTL 
 
 
 x-sig 
 HMAC tamper protection 
 
 
 x-uid 
 bound to user (tier 3) — shared link won't work for someone else 
 
 
 
 Who generates / validates 
 Backend (has secret) generates . Cloudflare edge (same secret) validates . MinIO never
seen by clients. 
 
 11. Streaming Token Strategy: MP4 vs HLS [v2 — new] 
 Expiry is checked when each request opens , not continuously. An in-flight transfer
completes even if the clock passes mid-download. The two formats differ because of how many
requests a single playback makes. 
 SHORT MP4 LONG HLS
 one request hundreds of requests
 ┌──────────┐ ┌──────────┐ ┌──┬──┬──┬──┐
 │ 720p.mp4 │ ← 1 signed URL │master.m3u│─▶│ts│ts│ts│..│
 └──────────┘ few-hr TTL └──────────┘ └──┴──┴──┴──┘
 opens once, short token long token
 rides to end (~1 min) (> video len)
 per-session one, shared
 
 Short video (MP4) — one signed URL 
 One file = one request (or a few byte-ranges, all opened up front). A few-hour expiry is
plenty; the in-flight request completes, so mid-play expiry basically never happens.
→ mint one signed URL per rung, few-hour TTL. No dual-token, no single-use. 
 Long video (HLS) — dual-token (or prefix-signed) 
 HLS = hundreds of segment requests (one every ~2s), so expiry is re-checked constantly. A
single short token would die mid-video. Industry-standard fix (Google Media CDN / CloudFront): 
 
 master.m3u8 → short token (~1 min). Per-session entry gate. Fetched once at start. 
 child manifests + .ts segments → long token (> full video length, up to ~1 day), carried by signed cookie or path-prefix signing (sign the folder, not each segment). 
 Segments stay cacheable (same bytes for everyone); only the manifest is per-session. Preserves cache hit rate. 
 
 
 At our stage on Cloudflare Free/Pro, do the simpler version: backend mints a long
prefix-signed token for the whole hls/ folder when the post loads. Graduate to true
edge dual-token (short→long exchange) if/when we move to a heavier CDN. 
 
 Single-use — downloads ONLY, never streams 
 A stream is many requests; making the token single-use would 403 the viewer one segment in.
 Streams use short expiry + x-uid binding instead. Single-use (Redis) is correct only
for one-file-one-request downloads (§12, digital products). 
 Edge case (both formats) 
 User pauses past expiry, then seeks → player fires a fresh request with a dead URL → 403.
 Client safety net: catch the 403, silently re-fetch a fresh URL from backend, resume. 
 
 
 
 
 Short MP4 
 Long HLS 
 
 
 
 
 Requests / play 
 one (few byte-ranges) 
 hundreds 
 
 
 Token 
 single signed URL 
 dual-token / prefix-signed 
 
 
 TTL 
 few hours 
 > watch duration 
 
 
 Single-use 
 no 
 no 
 
 
 
 
 12. Watermark & Download Path [v2 FIX] 
 Decision: Option B — downloads are watermarked. (Full build, not the launch shortcut.) 
 The feed NEVER contains a watermarked URL. Feed = clean playback URLs only. Watermarked
download is a separate endpoint , minted on tap. 
 User taps Download
 → GET /media/{fileId}/download?quality=720p (quality optional, default 720p)
 → backend: check canDownload (post setting) + tier (private/paid → auth check)
 → mint signed URL → {quality}_watermarked.mp4
 short TTL (~10 min), x-uid bound, single-use (Redis x-once)
 → log download event (orderId/buyerId/ip/device/country if digital)
 → return { downloadUrl, expiresIn, singleUse: true }
Client downloads. URL marked used in Redis → 403 on reuse.
 
 Default quality, no picker (short video). Serve one default rung ( 720p ) — matches
TikTok/IG, sane on TZ data. Endpoint takes optional quality so a picker is an additive
change later (long-form only, where the size gap matters). 
 Storage decision: store one watermarked rung (720p), not all three. If a picker is
added later, lazy-generate other rungs on first request and cache — only spend
CPU/storage on a quality someone actually downloads. 
 Timing: watermarked 720p is generated at processing time (pre-ready), so download is
instant — no first-user FFmpeg penalty. 
 // ── Download endpoint (picker-ready, single-use, watermarked) ──
@GetMapping("/media/{fileId}/download")
ResponseEntity<?> download(@PathVariable String fileId,
 @RequestParam(defaultValue = "720p") String quality,
 @AuthenticationPrincipal AccountEntity user) {
 MediaEntity m = mediaService.findReady(fileId);

 // ── access gate (tier 2 = open; tier 3 = ownership/visibility) ──
 if (!m.isCanDownload()) return ResponseEntity.status(403).build();
 if (m.isRestricted() && !accessService.canAccess(user, m))
 return ResponseEntity.status(403).build();

 // ── mint watermarked, user-bound, single-use URL ──
 String key = m.watermarkedKey(quality); // lazy-generate if absent
 String url = mediaService.mintDownloadUrl(key, user.getId(), Duration.ofMinutes(10));
 downloadLog.record(fileId, user, request); // audit
 return ResponseEntity.ok(Map.of("downloadUrl", url, "expiresIn", 600, "singleUse", true));
}
 
 
 13. Feed Response Shape [v2 — new] 
 Rules baked in: 
 
 Batch-mint inline. While building the page (already looping posts), mint every signed URL right there — no per-item round trip from the client. 
 Store objectKey , mint URL per request. Never persist a signed URL (it expires). 
 Tier 1 unsigned (avatar, thumb, poster, preview3s); tier 2 signed (full image, video rungs). 
 Placeholders filled (blurhash, lqip, dominantColor) + width / height / duration present. 
 No watermarked URL here (§12). canDownload flag tells the client whether to show the button. 
 
 {
 "data": [{
 "id": "0e67aef4-...",
 "author": {
 "id": "ad03431a-...", "userName": "juliusdev_", "verified": true,
 "profilePictureUrl": "https://media.nexgate.co/profiles/ad03431a-.../avatar_400.webp"
 },
 "content": "Testing video media post 🎬",
 "shareCode": "waBbP82h",
 "media": [{
 "id": "436bb62b-...",
 "mediaType": "VIDEO",
 "status": "READY",
 "width": 1080, "height": 1920, "aspectRatio": "9:16", "duration": 28,
 "isReelEligible": true,
 "streamingFormat": "MP4",
 "canDownload": true,
 "dominantColor": "#1a1a2e",
 "blurhash": "L6Pj0^i_.AyE_3t7t7R**0o#DgR4",
 "lqip": "data:image/webp;base64,UklGRjoAAAB...",
 "poster": "https://media.nexgate.co/posts/ad03431a-.../436bb62b-.../thumbnail.webp",
 "preview3s": "https://media.nexgate.co/posts/ad03431a-.../436bb62b-.../preview_3s.mp4",
 "variants": {
 "360p": "https://media.nexgate.co/posts/ad03431a-.../436bb62b-.../360p_clean.mp4?x-expires=1780297200&x-sig=a1f2b3",
 "720p": "https://media.nexgate.co/posts/ad03431a-.../436bb62b-.../720p_clean.mp4?x-expires=1780297200&x-sig=b2e3c4",
 "1080p": "https://media.nexgate.co/posts/ad03431a-.../436bb62b-.../1080p_clean.mp4?x-expires=1780297200&x-sig=c3f4d5"
 },
 "order": 1
 }],
 "privacySettings": { "visibility": "PUBLIC" }
 }]
}
 
 (HLS video: streamingFormat: "HLS" , replace variants with a single signed/prefix-signed master.m3u8 .) 
 
 14. Security & Anti-Scraping [v2 — expanded] 
 14.1 Close the visibility-bypass hole 
 v1's permanent unsigned URLs meant a PRIVATE post's bytes were reachable by anyone with the
link — the API hid the post, the file did not. Fix: 
 
 Remove public-read from MinIO buckets (already a stated rule — enforce it). 
 Route all access through signed CDN URLs. 
 Private/followers/DM/paid → tier 3 (auth check before minting + x-uid ).
After this, a hidden post's media returns 403 to anyone but the authorized viewer. 
 
 14.2 Two layers, both required 
 
 Signed URLs stop permanent theft: scraped links die in hours → no permanent mirror, no hotlinking. They do NOT stop a one-time download — a valid link is a valid link. 
 Bot gate stops the act of mass-pulling: it guards the feed API door so a bot can't cheaply pull fresh links over and over. 
 Neither alone is enough. Together = scraping isn't worth it. (You cannot make it impossible — the goal is "not worth the effort for 99%, traceable if they do.") 
 
 14.3 Bot gate (the missing half) — sized to moat (§30) 
 
 App attestation — the real app computes a hard-to-fake token (TikTok's X-Bogus / msToken equivalent). A plain script can't produce it without running app code. 
 Rate limiting — one "user" pulling thousands of posts/min = bot → throttle/block. 
 Auth + behaviour — require login; flag inhuman scroll/pull patterns. 
 Escalating friction — suspicious traffic → CAPTCHA / slowdown / ban. 
 
 
 Build depth depends on the moat decision (§30). Commerce-moat → lighter bot gate is
fine for launch. Content-moat → invest earlier. 
 
 14.4 Protection layers (reference) 
 Content-Disposition: inline · dynamic URL via JS · signed expiring URLs · session/device
binding · disable right-click (frontend) · watermark on download · HLS chunking. 
 
 15. OG (Open Graph) Strategy 
 OG images + og_preview.mp4 are permanent, unsigned, tier 1 — because crawlers
(WhatsApp/Telegram) fetch on share and the user may open hours later; a signed URL would
expire → "Error loading." Access control for private posts is enforced server-side (the
post page returns 403), not via the OG asset. Serve og_clean.webp to known platforms,
 og_play.webp (burned play button) to unknown crawlers (User-Agent detection). 
 
 16. Compression & Transcoding (reference) 
 Client: light pre-compress (max 1080p, CRF 18, never upscale). Server: FFprobe → adaptive
ladder (never upscale, never inflate → -c:v copy if output > input). CRF 21/23/25/28 for
1080/720/480/360, 30 for OG preview. 1080p max (all platforms serve ≤1080p anyway).
 -movflags +faststart always; frag_keyframe+empty_moov+faststart for fMP4.
Now: H.264 + AAC. Future: AV1 + Opus + CMAF. 
 
 17. BlurHash, LQIP & Dominant Color 
 Generated for every image + video thumbnail, at processing time. Progressive load:
dominantColor (0ms) → BlurHash (0ms) → LQIP (0ms) → thumb.webp (CDN) → full on tap.
All three stored in media_content_signals and shipped inline in the feed JSON. 
 
 18. Deduplication 
 SHA-256 on raw file → file_hashes . Exists+clean → skip processing, reference existing
variants, reference_count++ . Delete → decrement; 0 → remove from MinIO. Near-dup (pHash) =
future (copyright). 
 
 19. Quota Management 
 Enforced at presigned-URL generation (atomic SQL). Social = original size charged;
digital = all stored bytes. Plans: FREE 1GB / PRO 20GB / BUSINESS 100GB. Warn 80%/95%/100%.
Release on abandoned (immediate) and hard-delete (30d after soft). Redis cache, sync to PG
every 5 min. 
 
 20. Storage Lifecycle 
 raw: auto-delete 24h. public: keep forever, access-based tiering (hot NVMe → warm HDD → cold;
viral old content auto-promotes). private: DM 1y, audio 24h, KYC per law. digital: forever.
Soft delete → 30d grace → hard delete + Cloudflare purge + quota free. 
 
 21. Abandoned Upload Handling 
 Layer 1: client confirm (fast path). Layer 2: cleanup job every 30 min (PENDING >1h → file
exists = recover/process; not = abandoned, release quota). Layer 3: raw lifecycle 24h.
 fileId in object key enables recovery without confirm. Redis distributed lock on the job. 
 
 22. Progress Tracking (SSE) 
 SSE (one-way, auto-reconnect, resilient on TZ networks). Redis key upload_status:{fileId} :
PENDING→UPLOADING→UPLOADED→SCANNING→PROCESSING→LIVE_PARTIAL→READY (or FAILED/QUARANTINED/
ABANDONED). Multi-instance via Redis Pub/Sub bridging emitters. 
 
 23. Communication Architecture 
 No direct HTTP between services. RabbitMQ (FT ↔ Main Backend, topic nexgate.media ):
 media.upload.request , media.upload.url.ready , media.ready , media.live.partial ,
 media.failed , media.quarantined , digital.product.ready . Kafka (FT → Rec Engine /
Analytics, topic content.new ). 
 
 [v2 FIX] Events carry objectKey s + variants, not signed URLs. MEDIA_READY ships
storage keys (e.g. posts/usr_123/vid_789/720p_clean.mp4 ) + placeholders. The Main
Backend mints signed URLs per feed request from those keys — it never stores a signed
URL. 
 
 
 24. CDN Strategy 
 Cloudflare in front of MinIO. First request → edge → MinIO → cache → serve; thereafter from
cache. Cache-Control: tier 1 = 1y; tier 2 = path-keyed short edge TTL; tier 3 / API =
 private, no-store . MinIO never publicly exposed (only Cloudflare IPs). Phases: Free →
Pro ($20) → R2 (zero egress). See §10 cache-key gotcha. 
 
 25. Shareable Links 
 Client-side nexgate.co/reels/{shareCode}?ref=whatsapp . No uid in URL (server resolves
owner from shareCode → follow prompt). ref drives share analytics. Public content = no
token needed for the page; the media on the page still follows tier rules (§8). 
 
 26. Technology Stack 
 Java 21 / Spring Boot 3.x · FFmpeg (Jaffree) · ImageMagick (IM4Java) · blurhash-java ·
ClamAV (Docker, TCP 3310) · MinIO · Cloudflare · RabbitMQ · Kafka · Redis · PostgreSQL ·
TUS · Vault ( vault.qbitspark.com ) — holds the signing secret · Docker Compose · SSE. 
 
 27. Database Schema (key tables + v2 notes) 
 media_files (
 file_id UUID PK, owner_id UUID, directory ENUM, original_name TEXT,
 object_key TEXT, -- canonical storage key [v2: keys, never signed URLs]
 mime_type TEXT, file_size BIGINT,
 status ENUM, -- PENDING..READY..QUARANTINED..ABANDONED
 variants JSONB, -- variant object_keys (NOT URLs)
 metadata JSONB, scan_result TEXT, hash TEXT,
 is_reel_eligible BOOLEAN, streaming_format ENUM,
 can_download BOOLEAN DEFAULT true, -- [v2] drives feed canDownload + download gate
 created_at TIMESTAMPTZ, ready_at TIMESTAMPTZ
);

media_content_signals (
 file_id UUID FK, has_audio BOOLEAN, audio_ref TEXT,
 dominant_color TEXT, blurhash TEXT, lqip TEXT,
 aspect_ratio TEXT, quality_score INT, width INT, height INT, duration_seconds DECIMAL
);

-- user_storage_quota, file_hashes, media_variants (HOT/WARM/COLD),
-- digital_product_files, digital_orders, digital_download_logs (as v1)
 
 Indexes: idx_media_cleanup (created_at) WHERE status='PENDING' ; unique file_hashes(hash) ;
 media_files(owner_id, status, created_at DESC) . 
 
 [v2 FIX] No signed_url column anywhere. Signed URLs are minted per request and
discarded. The DB stores only object_key + variants (keys). 
 
 
 28. Docker & Infrastructure (reference) 
 eclipse-temurin:21 + ffmpeg + imagemagick(+webp/heic). ClamAV separate container
(daemon, auto-updates). FT: 4 cores / 16GB (FFmpeg CPU-heavy). ClamAV: 1 core / 2GB.
Env: PG, Redis, RabbitMQ, Kafka, MinIO, ClamAV, VAULT_ADDR . 
 
 29. What File Thunder Does NOT Do 
 ❌ ML/AI classification · Whisper (Rec Engine) · CLIP · feed ranking · social graph ·
payments · auth · push · search · business logic · fan-out · recommendations.
Extracts audio.wav temporarily; Rec Engine fetches/transcribes/deletes. FT never knows
audio content. 
 
 30. Open Decisions (parked) 
 
 Moat: content vs commerce? This drives §14.3 (bot-gate depth) and §8 (how hard to
stamp tier-2 public media). If commerce is the moat (shops/payments/events), public
media scraping hurts less → lighter bot gate, tier-2 stamping is softer/optional. If
content is the moat → invest in bot gate + stamp tier-2 firmly. Decide before finalizing
anti-scraping work. 
 Bot gate depth for launch — minimum: login + rate limit. Full app-attestation later,
sized to (1). 
 HLS dual-token vs prefix-sign for launch — prefix-sign on Cloudflare now; edge
dual-token if/when heavier CDN (§11). 
 Download quality picker — default 720p now; picker + lazy-gen later (§12). 
 
 
 Summary — File Thunder v2 in one line 
 
 Receives raw files → ClamAV → processes into clean + watermarked variants (FFmpeg/IM) with
blurhash/lqip/dominant-color → stores object keys across 4 buckets in MinIO (never
public-read) → the Main Backend mints signed delivery URLs per request from those keys
using a three-tier policy (trivial-public unsigned, public-content signed, private/paid
signed + auth) → Cloudflare serves and caches by tier → streams use MP4-single-URL or
HLS-dual-token, downloads use single-use watermarked URLs → signed URLs + a bot gate
together make scraping not worth the effort — all without business logic, ML, or social ops. 
 
 
 File Thunder Architecture Guide v2.0 — NexGate / QBIT SPARK 
 v2 integrates media-security design review. [v2 FIX] tags mark every change from v1.