# File Thunder — Architecture & Design Guide V2 NexGate Media Engine | Version 2.0 v2 integrates the media-security refinements worked out in design review. Every change from v1 is tagged inline as [v2 FIX] so it can be reviewed individually. Section 0 lists all of them in one place — read that first. 0. What changed in v2 (read this first) These are the careful corrections. Each is explained in full in its section. Signing is three tiers, not two. "Public = unsigned, private = signed" was wrong. Correct model: trivial public assets unsigned; public content (full images/videos) signed even though public ; private/paid signed + auth. → §8 Public content gets signed too. A signed URL is anti-scrape / anti-hotlink, not only access control. TikTok signs even public videos. → §8, §14 The visibility-bypass hole must be closed. In v1, media URLs were permanent and unsigned, so a PRIVATE post's bytes were reachable by anyone with the link — the post was hidden by the API but the file was not. Fixed by locking the bucket (no public-read) and routing everything through signed CDN URLs. → §14 Backend signs, CDN validates, they never talk at sign time. Signing is pure math on (path + expiry + secret) . No file, no CDN contact. CDN fetches from MinIO lazily on first view. → §9 Domain is media.nexgate.co , not .com . v1 mixed both. One domain everywhere. → §10 Object keys must be owner-correct. v1 had one user's avatar stored under another user's prefix. The owner segment in the key must match the real owner. → §3 Store the objectKey , never the signed URL. Signed URLs are minted per request and thrown away. Never persisted in the DB or feed cache. → §13, §22 MP4 and HLS use different token strategies. Short MP4 = one signed URL. Long HLS = dual-token (short manifest + long segments) or prefix-signing, so playback never dies mid-video. → §11 Single-use is for downloads only, never streams. A stream is many requests; single-use would break playback. Streams use short expiry + user-binding instead. → §11, §12 Watermarked file is NEVER in the feed response. Feed = clean playback URLs only. Watermarked download comes from a separate endpoint, minted on tap. → §12, §13 Cache gotcha: never cache a signed object keyed by its signature. A cached object can outlive its stamp; private/paid content must bypass shared cache. → §10, §11 Bot gate is the missing half of anti-scraping. Signed URLs stop permanent theft; the bot gate stops the act of mass-pulling. v1 had no bot gate section. → §14 Open decision parked: NextGate's moat (content vs commerce) drives how hard to stamp public media and how heavy the bot gate needs to be. → §30 1. What is File Thunder? File Thunder is NexGate's dedicated media processing engine — a standalone Spring Boot microservice responsible for all file and media operations across the platform. Core principle: The Main Backend never touches raw files. It delegates all media operations to File Thunder and acts as a thin client. Responsibilities: receive uploads, validate (type/size/quota), virus scan, transcode, generate variants (thumbnails, WebP, HLS, MP4), watermark, store in MinIO, serve via CDN, mint signed delivery URLs, publish media-ready events. Not: AI/ML, recommendation, social graph, payments, auth, business logic. 2. The Four Wheels FILE THUNDER ┌───────────┬───────────┬───────────┬───────────┐ │ FFmpeg │ ImageMagic│ ClamAV │ MinIO │ │ (video) │ (images) │ (security)│ (storage) │ └───────────┴───────────┴───────────┴───────────┘ transcode resize + virus object HLS, w/mark WebP, blur scan store + CDN FFmpeg (Jaffree) — transcode, HLS segmenting, thumbnail/best-frame, 3s preview, watermark overlay, blur-pad, fMP4, format conversion, audio extract/normalize. ImageMagick (IM4Java) — resize variants, format→WebP, EXIF strip (privacy), auto-orient, dominant color, LQIP, OG image, GIF→WebP, PDF→preview. Plus blurhash-java . ClamAV — virus scan every upload (TCP clamav:3310 ), hash-cache, digital products scanned twice. MinIO — S3-compatible storage, presigned direct uploads, origin for Cloudflare, multipart + byte-range, lifecycle policies. Never publicly exposed (only Cloudflare + File Thunder reach it). 3. Storage Architecture (MinIO Buckets) Rule: 4 buckets total. Never one bucket per user. nexgate-raw/ ← temporary upload landing (auto-delete 24h) nexgate-public/ ← social content (avatars, posts, stories, events, shops) nexgate-private/ ← DMs, documents, KYC, temp audio nexgate-digital/ ← purchase-gated products Object Key Pattern {bucket}/{domain}/{ownerId}/{entityId}/{fileId}/{variant} nexgate-public/posts/usr_123/post_456/file_789/720p_clean.mp4 nexgate-private/messages/conv_abc/file_789/original.jpg nexgate-digital/products/shop_xyz/prod_123/file_789/original/file.pdf [v2 FIX] Owner-correct keys. The {ownerId} segment MUST be the real owner of the file. v1 had cases where one user's avatar lived under a different user's prefix — that leaks identity and misattributes ownership. The owner in the key = the owner in the DB. Keys are otherwise opaque (no usernames, no PII). nexgate-public layout (unchanged from v1) profiles/{userId}/{fileId}/ avatar_400.webp, avatar_150.webp, avatar_50.webp, cover.webp posts/{userId}/{fileId}/ {360,720,1080}p_clean.mp4, {360,720,1080}p_watermarked.mp4, preview_3s.mp4, hls/master.m3u8, hls/{360,720,1080}p/seg_*.ts, thumbnail.webp, og_clean.webp, og_play.webp, og_preview.mp4 stories/{userId}/{fileId}/ 720p_clean.mp4, thumbnail.webp events/{accountId}/{eventId}/{fileId}/ banner.webp, banner_mobile.webp, banner_thumb.webp shops/{shopId}/{productId}/{fileId}/ large.webp, medium.webp, thumb.webp (See §12 for the watermarked-variant storage decision — store one rung, not all three.) nexgate-private / nexgate-digital As v1: messages, documents, kyc, temp audio (private); products with original/ (never exposed) + preview/ + cover/ (digital). 4. Accepted File Formats Images accept JPEG/PNG/HEIC/WebP/GIF; reject BMP/TIFF/RAW/SVG (SVG = security risk). Output always WebP . Videos accept MP4/MOV/MKV/WEBM/AVI/3GP; reject WMV/FLV/VOB. Output H.264 MP4 (short) or HLS H.264 (long). Digital PDF, DOCX/XLSX/PPTX, GLB/OBJ/FBX/STL, MP3/WAV/FLAC, ZIP/RAR, PSD/AI, PNG. 5. Upload Flow [1] Client-side intelligent compression (max 1080p, CRF 18 light, HEIC→JPEG) [2] POST /media/upload-request { fileName, fileSize, mimeType, directory, clientMeta } [3] File Thunder: validate MIME + size + quota (atomic SQL) → DB record PENDING → presigned MinIO URL (nexgate-raw) → { fileId, uploadUrl, expiresIn: 1800 } [4] Client uploads directly to MinIO (TUS resumable / multipart >5MB) [5] Client confirms POST /media/confirm { fileId } (cleanup job recovers if missed) [6] Processing pipeline starts client compress │ ▼ POST upload-request ──▶ validate + quota ──▶ presigned URL │ ▼ upload direct to MinIO (raw) │ ▼ confirm ──▶ ClamAV ──▶ dedup ──▶ process ──▶ READY │ ▼ RabbitMQ media.ready ──▶ Main Backend Quota checked at presigned-URL time (atomic UPDATE ... WHERE used+size <= quota ). Duration checked AFTER upload via FFprobe (client value is a hint, not trusted) — over limit → delete from raw, release quota, 403 DURATION_EXCEEDED . Per-Upload Limits Plan Image Short Video Long Video Digital Video Duration FREE 20MB 200MB ❌ ❌ 5 min PRO 20MB 500MB 2GB 5GB 60 min BUSINESS 20MB 2GB 5GB 20GB Unlimited 6. Processing Pipelines (summary) All pipelines: ClamAV scan → SHA-256 dedup check → process → store → delete raw → DB READY → publish events. Watermarking and all placeholders are generated here, at processing time , never at serve time. 6.1 Image Auto-orient → strip EXIF (privacy) → NSFW check → dominant color + LQIP + BlurHash → size variants per content type (Post: 1600/800/300; Profile: 400/150/50; Product: 1000/500/200 square; Event: 1200×630/800×420) → WebP → OG image → store public → READY. 6.2 Short Video (MP4, duration < 3 min) FFprobe → adaptive transcode decision (never upscale, never inflate) → fast lane (quick 360p → LIVE_PARTIAL ) → full processing: Clean variants {360,720,1080}p_clean.mp4 (faststart). Non-9:16 reel-eligible → blur-pad to 9:16 (scale-fill+blur background, scale-fit sharp foreground, overlay). Watermarked variant(s) — moving watermark (jumps corners every 3s; 80×80 logo 60% opacity + username; app-based label from clientApp ). [v2 FIX] store one download rung (720p), not all three (§12). Extras: thumbnail.webp (best-frame scored) + its LQIP/BlurHash/dominantColor, preview_3s.mp4 (360p muted watermarked), og_clean.webp , og_play.webp , og_preview.mp4 . fMP4 ( frag_keyframe+empty_moov+faststart ) for byte-range. Extract audio.wav → nexgate-private (Rec Engine). READY: isReelEligible (<3min), streamingFormat: MP4 , variants JSONB, aspectRatio, qualityScore, duration. 6.3 Long Video (HLS, duration ≥ 3 min) Same as short EXCEPT: HLS adaptive (no fast lane — stays PROCESSING until done). hls/{360,720,1080}p/ segments + master.m3u8 . Segment = 2s, H.264 + AAC, MPEG-TS. Still makes preview_3s, thumbnail, placeholders, OG. isReelEligible: false , streamingFormat: HLS . 6.4 Digital Products Double ClamAV scan → SHA-256 checksum → light preview by type (PDF p1 watermarked, video first 2min 480p watermarked, audio 30s 96kbps, 3D multi-angle thumbs, image 600px watermarked) → store original/ (never exposed) + preview/ + cover/ → READY. 6.5 DM Attachments nexgate-private, no CDN , conversation-membership check, thumb+original only, no watermark, no reel pool, signed 5-min URLs. Virus scanned like everything. 7. Watermarking Strategy Applied at processing time, once. Stored as separate _watermarked variant. Never at serve time, never client-side. Moving watermark (corners every 3s) — static is croppable, moving is not. Served: in-app playback → clean variant; download button → watermarked variant (§12). Direct CDN URL → clean (unavoidable; accepted, same tradeoff every platform has). Pre-watermarked uploads (from other platforms): Phase 1 = no detection, just overlay ours. Phase 3 = AI logo detection → suppress reach (Instagram model), never block. Username change: old videos keep old username (like TikTok). [v2 FIX] Honest scope. A watermark does NOT prevent download — it survives download. Its job is attribution + traceability, not prevention. The clean file is reachable by a determined ripper via the playback URL; that's accepted. Goal = "not worth the effort for 99% + traceable if they do." 8. The Three Delivery Tiers [v2 FIX — core correction] v1 assumed two levels (public unsigned / private signed). Correct model is three : Tier Content Login to view? URL signed? Expiry CDN cache 1 — trivial public avatars, thumbnails, posters, blur previews no no n/a aggressive, 1y 2 — public content full post images, video rungs no YES hours careful (§10) 3 — private / paid DM media, private posts, digital products, KYC yes YES minutes none / private ┌──────────────────────────────────────────────┐ │ TIER 1 trivial public │ │ avatars, thumbs → unsigned, cache 1y │ ├──────────────────────────────────────────────┤ │ TIER 2 public content │ │ full images/video → SIGNED, hrs TTL │ ├──────────────────────────────────────────────┤ │ TIER 3 private / paid │ │ DM, private, digital → SIGNED + auth + x-uid │ └──────────────────────────────────────────────┘ only diff T2 vs T3 = auth check before minting The key insight: "public to view" and "signed URL" are independent. Tier 2 content is open to watch by anyone (no login), but the file is still signed — because signing is anti-scrape / anti-hotlink , not access control. This is exactly TikTok's model (public video, yet the raw file link expires and 403s). The only difference between tier 2 and tier 3 in code is whether an auth/visibility check runs before minting — the signing itself is identical. Tier 1 is the only unsigned tier. Anything a scraper would actually want (full media) is tier 2+, i.e. signed. 9. Signed URLs — how they actually work [v2 — new, the mechanism] A signed URL = a normal link with a stamp made from three things: stamp (x-sig) = HMAC(secret, path + expiry) path — the object key being requested expiry ( x-expires ) — unix time the link dies secret — a key known ONLY to File Thunder/backend AND the CDN. Never in the URL, browser, or JSON. Lives in Vault. Who does what, and when — three separate moments: Upload time: File Thunder makes the variants, stores them in MinIO. CDN not involved. Feed-build time (signing): backend computes x-sig from (path + expiry + secret) . Pure string math — no file opened, no CDN contacted, no storage touched. It's just writing a stamped link. (You can sign a path whose bytes don't even exist yet — the math only sees the string.) Playback time (serving): the phone requests the stamped URL → CDN re-computes the stamp with its copy of the secret. Match + not expired → serve. First viewer: CDN lazily fetches the file from MinIO and caches it. Later viewers: served from cache. upload feed build playback ┌────────┐ store ┌──────────┐ stamped ┌──────────┐ │ File │ ───────▶ │ Backend │ ─ link ──▶ │ CDN │ │ Thunder│ to MinIO │ signs │ │ validates│ └────────┘ └──────────┘ └────┬─────┘ secret ▲ │ first view Vault └──────── same ────────┘ fetch+cache secret from MinIO Tamper resistance (why a stripped/edited link fails): No sig → CDN rejects (paths require a valid sig) → 403. Fake sig → recompute doesn't match → 403. Edited expiry → sig was computed from the old expiry, no longer matches → 403. Swapped path → sig bound to original path → 403. Untouched + in time → serve. After x-expires → even the real link 403s. Forging anything requires the secret , which never leaves backend + CDN. Protect the secret (Vault) and the whole scheme holds. You don't write the HMAC. The MinIO SDK (or CDN signer) does it in one call: // ── Mint a signed delivery URL ── String url = minioClient.getPresignedObjectUrl( GetPresignedObjectUrlArgs.builder() .method(Method.GET) .bucket(bucket) .object(objectKey) .expiry(seconds, TimeUnit.SECONDS) // → x-expires + x-sig .build()); 10. URL Strategy Tier 1 — trivial public → permanent, unsigned, cache hard https://media.nexgate.co/profiles/usr_123/avatar_400.webp https://media.nexgate.co/posts/usr_123/vid_789/thumbnail.webp Cache-Control: public, max-age=31536000 . Scraping these gains nothing. Tier 2 — public content → signed, short-ish expiry https://media.nexgate.co/posts/usr_123/vid_789/720p_clean.mp4?x-expires=1780297200&x-sig=8f3a9c Tier 3 — private / paid → signed + user-bound + auth check first https://media.nexgate.co/private/messages/conv_abc/file_789/original.jpg ?x-expires=...&x-sig=...&x-uid=usr_xyz [v2 FIX] Domain is .co . All URLs use media.nexgate.co . v1 mixed .com / .co — that produces broken links. One domain everywhere. [v2 FIX] Cache-key gotcha. Never let the CDN cache a signed object using a cache key that includes x-sig / x-uid — you'd fragment cache per user, and a cached object can outlive its stamp (origin checks expiry, but a cache hit never reaches origin). Rules: tier 1 cache hard; tier 2 cache by path only (strip query from cache key), short edge TTL; tier 3 never on shared cache. Signed URL params Param Purpose x-expires link dies after TTL x-sig HMAC tamper protection x-uid bound to user (tier 3) — shared link won't work for someone else Who generates / validates Backend (has secret) generates . Cloudflare edge (same secret) validates . MinIO never seen by clients. 11. Streaming Token Strategy: MP4 vs HLS [v2 — new] Expiry is checked when each request opens , not continuously. An in-flight transfer completes even if the clock passes mid-download. The two formats differ because of how many requests a single playback makes. SHORT MP4 LONG HLS one request hundreds of requests ┌──────────┐ ┌──────────┐ ┌──┬──┬──┬──┐ │ 720p.mp4 │ ← 1 signed URL │master.m3u│─▶│ts│ts│ts│..│ └──────────┘ few-hr TTL └──────────┘ └──┴──┴──┴──┘ opens once, short token long token rides to end (~1 min) (> video len) per-session one, shared Short video (MP4) — one signed URL One file = one request (or a few byte-ranges, all opened up front). A few-hour expiry is plenty; the in-flight request completes, so mid-play expiry basically never happens. → mint one signed URL per rung, few-hour TTL. No dual-token, no single-use. Long video (HLS) — dual-token (or prefix-signed) HLS = hundreds of segment requests (one every ~2s), so expiry is re-checked constantly. A single short token would die mid-video. Industry-standard fix (Google Media CDN / CloudFront): master.m3u8 → short token (~1 min). Per-session entry gate. Fetched once at start. child manifests + .ts segments → long token (> full video length, up to ~1 day), carried by signed cookie or path-prefix signing (sign the folder, not each segment). Segments stay cacheable (same bytes for everyone); only the manifest is per-session. Preserves cache hit rate. At our stage on Cloudflare Free/Pro, do the simpler version: backend mints a long prefix-signed token for the whole hls/ folder when the post loads. Graduate to true edge dual-token (short→long exchange) if/when we move to a heavier CDN. Single-use — downloads ONLY, never streams A stream is many requests; making the token single-use would 403 the viewer one segment in. Streams use short expiry + x-uid binding instead. Single-use (Redis) is correct only for one-file-one-request downloads (§12, digital products). Edge case (both formats) User pauses past expiry, then seeks → player fires a fresh request with a dead URL → 403. Client safety net: catch the 403, silently re-fetch a fresh URL from backend, resume. Short MP4 Long HLS Requests / play one (few byte-ranges) hundreds Token single signed URL dual-token / prefix-signed TTL few hours > watch duration Single-use no no 12. Watermark & Download Path [v2 FIX] Decision: Option B — downloads are watermarked. (Full build, not the launch shortcut.) The feed NEVER contains a watermarked URL. Feed = clean playback URLs only. Watermarked download is a separate endpoint , minted on tap. User taps Download → GET /media/{fileId}/download?quality=720p (quality optional, default 720p) → backend: check canDownload (post setting) + tier (private/paid → auth check) → mint signed URL → {quality}_watermarked.mp4 short TTL (~10 min), x-uid bound, single-use (Redis x-once) → log download event (orderId/buyerId/ip/device/country if digital) → return { downloadUrl, expiresIn, singleUse: true } Client downloads. URL marked used in Redis → 403 on reuse. Default quality, no picker (short video). Serve one default rung ( 720p ) — matches TikTok/IG, sane on TZ data. Endpoint takes optional quality so a picker is an additive change later (long-form only, where the size gap matters). Storage decision: store one watermarked rung (720p), not all three. If a picker is added later, lazy-generate other rungs on first request and cache — only spend CPU/storage on a quality someone actually downloads. Timing: watermarked 720p is generated at processing time (pre-ready), so download is instant — no first-user FFmpeg penalty. // ── Download endpoint (picker-ready, single-use, watermarked) ── @GetMapping("/media/{fileId}/download") ResponseEntity download(@PathVariable String fileId, @RequestParam(defaultValue = "720p") String quality, @AuthenticationPrincipal AccountEntity user) { MediaEntity m = mediaService.findReady(fileId); // ── access gate (tier 2 = open; tier 3 = ownership/visibility) ── if (!m.isCanDownload()) return ResponseEntity.status(403).build(); if (m.isRestricted() && !accessService.canAccess(user, m)) return ResponseEntity.status(403).build(); // ── mint watermarked, user-bound, single-use URL ── String key = m.watermarkedKey(quality); // lazy-generate if absent String url = mediaService.mintDownloadUrl(key, user.getId(), Duration.ofMinutes(10)); downloadLog.record(fileId, user, request); // audit return ResponseEntity.ok(Map.of("downloadUrl", url, "expiresIn", 600, "singleUse", true)); } 13. Feed Response Shape [v2 — new] Rules baked in: Batch-mint inline. While building the page (already looping posts), mint every signed URL right there — no per-item round trip from the client. Store objectKey , mint URL per request. Never persist a signed URL (it expires). Tier 1 unsigned (avatar, thumb, poster, preview3s); tier 2 signed (full image, video rungs). Placeholders filled (blurhash, lqip, dominantColor) + width / height / duration present. No watermarked URL here (§12). canDownload flag tells the client whether to show the button. { "data": [{ "id": "0e67aef4-...", "author": { "id": "ad03431a-...", "userName": "juliusdev_", "verified": true, "profilePictureUrl": "https://media.nexgate.co/profiles/ad03431a-.../avatar_400.webp" }, "content": "Testing video media post 🎬", "shareCode": "waBbP82h", "media": [{ "id": "436bb62b-...", "mediaType": "VIDEO", "status": "READY", "width": 1080, "height": 1920, "aspectRatio": "9:16", "duration": 28, "isReelEligible": true, "streamingFormat": "MP4", "canDownload": true, "dominantColor": "#1a1a2e", "blurhash": "L6Pj0^i_.AyE_3t7t7R**0o#DgR4", "lqip": "data:image/webp;base64,UklGRjoAAAB...", "poster": "https://media.nexgate.co/posts/ad03431a-.../436bb62b-.../thumbnail.webp", "preview3s": "https://media.nexgate.co/posts/ad03431a-.../436bb62b-.../preview_3s.mp4", "variants": { "360p": "https://media.nexgate.co/posts/ad03431a-.../436bb62b-.../360p_clean.mp4?x-expires=1780297200&x-sig=a1f2b3", "720p": "https://media.nexgate.co/posts/ad03431a-.../436bb62b-.../720p_clean.mp4?x-expires=1780297200&x-sig=b2e3c4", "1080p": "https://media.nexgate.co/posts/ad03431a-.../436bb62b-.../1080p_clean.mp4?x-expires=1780297200&x-sig=c3f4d5" }, "order": 1 }], "privacySettings": { "visibility": "PUBLIC" } }] } (HLS video: streamingFormat: "HLS" , replace variants with a single signed/prefix-signed master.m3u8 .) 14. Security & Anti-Scraping [v2 — expanded] 14.1 Close the visibility-bypass hole v1's permanent unsigned URLs meant a PRIVATE post's bytes were reachable by anyone with the link — the API hid the post, the file did not. Fix: Remove public-read from MinIO buckets (already a stated rule — enforce it). Route all access through signed CDN URLs. Private/followers/DM/paid → tier 3 (auth check before minting + x-uid ). After this, a hidden post's media returns 403 to anyone but the authorized viewer. 14.2 Two layers, both required Signed URLs stop permanent theft: scraped links die in hours → no permanent mirror, no hotlinking. They do NOT stop a one-time download — a valid link is a valid link. Bot gate stops the act of mass-pulling: it guards the feed API door so a bot can't cheaply pull fresh links over and over. Neither alone is enough. Together = scraping isn't worth it. (You cannot make it impossible — the goal is "not worth the effort for 99%, traceable if they do.") 14.3 Bot gate (the missing half) — sized to moat (§30) App attestation — the real app computes a hard-to-fake token (TikTok's X-Bogus / msToken equivalent). A plain script can't produce it without running app code. Rate limiting — one "user" pulling thousands of posts/min = bot → throttle/block. Auth + behaviour — require login; flag inhuman scroll/pull patterns. Escalating friction — suspicious traffic → CAPTCHA / slowdown / ban. Build depth depends on the moat decision (§30). Commerce-moat → lighter bot gate is fine for launch. Content-moat → invest earlier. 14.4 Protection layers (reference) Content-Disposition: inline · dynamic URL via JS · signed expiring URLs · session/device binding · disable right-click (frontend) · watermark on download · HLS chunking. 15. OG (Open Graph) Strategy OG images + og_preview.mp4 are permanent, unsigned, tier 1 — because crawlers (WhatsApp/Telegram) fetch on share and the user may open hours later; a signed URL would expire → "Error loading." Access control for private posts is enforced server-side (the post page returns 403), not via the OG asset. Serve og_clean.webp to known platforms, og_play.webp (burned play button) to unknown crawlers (User-Agent detection). 16. Compression & Transcoding (reference) Client: light pre-compress (max 1080p, CRF 18, never upscale). Server: FFprobe → adaptive ladder (never upscale, never inflate → -c:v copy if output > input). CRF 21/23/25/28 for 1080/720/480/360, 30 for OG preview. 1080p max (all platforms serve ≤1080p anyway). -movflags +faststart always; frag_keyframe+empty_moov+faststart for fMP4. Now: H.264 + AAC. Future: AV1 + Opus + CMAF. 17. BlurHash, LQIP & Dominant Color Generated for every image + video thumbnail, at processing time. Progressive load: dominantColor (0ms) → BlurHash (0ms) → LQIP (0ms) → thumb.webp (CDN) → full on tap. All three stored in media_content_signals and shipped inline in the feed JSON. 18. Deduplication SHA-256 on raw file → file_hashes . Exists+clean → skip processing, reference existing variants, reference_count++ . Delete → decrement; 0 → remove from MinIO. Near-dup (pHash) = future (copyright). 19. Quota Management Enforced at presigned-URL generation (atomic SQL). Social = original size charged; digital = all stored bytes. Plans: FREE 1GB / PRO 20GB / BUSINESS 100GB. Warn 80%/95%/100%. Release on abandoned (immediate) and hard-delete (30d after soft). Redis cache, sync to PG every 5 min. 20. Storage Lifecycle raw: auto-delete 24h. public: keep forever, access-based tiering (hot NVMe → warm HDD → cold; viral old content auto-promotes). private: DM 1y, audio 24h, KYC per law. digital: forever. Soft delete → 30d grace → hard delete + Cloudflare purge + quota free. 21. Abandoned Upload Handling Layer 1: client confirm (fast path). Layer 2: cleanup job every 30 min (PENDING >1h → file exists = recover/process; not = abandoned, release quota). Layer 3: raw lifecycle 24h. fileId in object key enables recovery without confirm. Redis distributed lock on the job. 22. Progress Tracking (SSE) SSE (one-way, auto-reconnect, resilient on TZ networks). Redis key upload_status:{fileId} : PENDING→UPLOADING→UPLOADED→SCANNING→PROCESSING→LIVE_PARTIAL→READY (or FAILED/QUARANTINED/ ABANDONED). Multi-instance via Redis Pub/Sub bridging emitters. 23. Communication Architecture No direct HTTP between services. RabbitMQ (FT ↔ Main Backend, topic nexgate.media ): media.upload.request , media.upload.url.ready , media.ready , media.live.partial , media.failed , media.quarantined , digital.product.ready . Kafka (FT → Rec Engine / Analytics, topic content.new ). [v2 FIX] Events carry objectKey s + variants, not signed URLs. MEDIA_READY ships storage keys (e.g. posts/usr_123/vid_789/720p_clean.mp4 ) + placeholders. The Main Backend mints signed URLs per feed request from those keys — it never stores a signed URL. 24. CDN Strategy Cloudflare in front of MinIO. First request → edge → MinIO → cache → serve; thereafter from cache. Cache-Control: tier 1 = 1y; tier 2 = path-keyed short edge TTL; tier 3 / API = private, no-store . MinIO never publicly exposed (only Cloudflare IPs). Phases: Free → Pro ($20) → R2 (zero egress). See §10 cache-key gotcha. 25. Shareable Links Client-side nexgate.co/reels/{shareCode}?ref=whatsapp . No uid in URL (server resolves owner from shareCode → follow prompt). ref drives share analytics. Public content = no token needed for the page; the media on the page still follows tier rules (§8). 26. Technology Stack Java 21 / Spring Boot 3.x · FFmpeg (Jaffree) · ImageMagick (IM4Java) · blurhash-java · ClamAV (Docker, TCP 3310) · MinIO · Cloudflare · RabbitMQ · Kafka · Redis · PostgreSQL · TUS · Vault ( vault.qbitspark.com ) — holds the signing secret · Docker Compose · SSE. 27. Database Schema (key tables + v2 notes) media_files ( file_id UUID PK, owner_id UUID, directory ENUM, original_name TEXT, object_key TEXT, -- canonical storage key [v2: keys, never signed URLs] mime_type TEXT, file_size BIGINT, status ENUM, -- PENDING..READY..QUARANTINED..ABANDONED variants JSONB, -- variant object_keys (NOT URLs) metadata JSONB, scan_result TEXT, hash TEXT, is_reel_eligible BOOLEAN, streaming_format ENUM, can_download BOOLEAN DEFAULT true, -- [v2] drives feed canDownload + download gate created_at TIMESTAMPTZ, ready_at TIMESTAMPTZ ); media_content_signals ( file_id UUID FK, has_audio BOOLEAN, audio_ref TEXT, dominant_color TEXT, blurhash TEXT, lqip TEXT, aspect_ratio TEXT, quality_score INT, width INT, height INT, duration_seconds DECIMAL ); -- user_storage_quota, file_hashes, media_variants (HOT/WARM/COLD), -- digital_product_files, digital_orders, digital_download_logs (as v1) Indexes: idx_media_cleanup (created_at) WHERE status='PENDING' ; unique file_hashes(hash) ; media_files(owner_id, status, created_at DESC) . [v2 FIX] No signed_url column anywhere. Signed URLs are minted per request and discarded. The DB stores only object_key + variants (keys). 28. Docker & Infrastructure (reference) eclipse-temurin:21 + ffmpeg + imagemagick(+webp/heic). ClamAV separate container (daemon, auto-updates). FT: 4 cores / 16GB (FFmpeg CPU-heavy). ClamAV: 1 core / 2GB. Env: PG, Redis, RabbitMQ, Kafka, MinIO, ClamAV, VAULT_ADDR . 29. What File Thunder Does NOT Do ❌ ML/AI classification · Whisper (Rec Engine) · CLIP · feed ranking · social graph · payments · auth · push · search · business logic · fan-out · recommendations. Extracts audio.wav temporarily; Rec Engine fetches/transcribes/deletes. FT never knows audio content. 30. Open Decisions (parked) Moat: content vs commerce? This drives §14.3 (bot-gate depth) and §8 (how hard to stamp tier-2 public media). If commerce is the moat (shops/payments/events), public media scraping hurts less → lighter bot gate, tier-2 stamping is softer/optional. If content is the moat → invest in bot gate + stamp tier-2 firmly. Decide before finalizing anti-scraping work. Bot gate depth for launch — minimum: login + rate limit. Full app-attestation later, sized to (1). HLS dual-token vs prefix-sign for launch — prefix-sign on Cloudflare now; edge dual-token if/when heavier CDN (§11). Download quality picker — default 720p now; picker + lazy-gen later (§12). Summary — File Thunder v2 in one line Receives raw files → ClamAV → processes into clean + watermarked variants (FFmpeg/IM) with blurhash/lqip/dominant-color → stores object keys across 4 buckets in MinIO (never public-read) → the Main Backend mints signed delivery URLs per request from those keys using a three-tier policy (trivial-public unsigned, public-content signed, private/paid signed + auth) → Cloudflare serves and caches by tier → streams use MP4-single-URL or HLS-dual-token, downloads use single-use watermarked URLs → signed URLs + a bot gate together make scraping not worth the effort — all without business logic, ML, or social ops. File Thunder Architecture Guide v2.0 — NexGate / QBIT SPARK v2 integrates media-security design review. [v2 FIX] tags mark every change from v1.