# File Thunder — Architecture & Design Guide V2
NexGate Media Engine | Version 2.0
v2 integrates the media-security refinements worked out in design review. Every change from v1 is tagged inline as [v2 FIX] so it can be reviewed individually. Section 0 lists all of them in one place — read that first.
0. What changed in v2 (read this first)
These are the careful corrections. Each is explained in full in its section.
- Signing is three tiers, not two. "Public = unsigned, private = signed" was wrong. Correct model: trivial public assets unsigned; public content (full images/videos) signed even though public; private/paid signed + auth. → §8
- Public content gets signed too. A signed URL is anti-scrape / anti-hotlink, not only access control. TikTok signs even public videos. → §8, §14
- The visibility-bypass hole must be closed. In v1, media URLs were permanent and unsigned, so a PRIVATE post's bytes were reachable by anyone with the link — the post was hidden by the API but the file was not. Fixed by locking the bucket (no public-read) and routing everything through signed CDN URLs. → §14
- Backend signs, CDN validates, they never talk at sign time. Signing is pure math on
(path + expiry + secret). No file, no CDN contact. CDN fetches from MinIO lazily on first view. → §9 - Domain is
media.nexgate.co, not.com. v1 mixed both. One domain everywhere. → §10 - Object keys must be owner-correct. v1 had one user's avatar stored under another user's prefix. The owner segment in the key must match the real owner. → §3
- Store the
objectKey, never the signed URL. Signed URLs are minted per request and thrown away. Never persisted in the DB or feed cache. → §13, §22 - MP4 and HLS use different token strategies. Short MP4 = one signed URL. Long HLS = dual-token (short manifest + long segments) or prefix-signing, so playback never dies mid-video. → §11
- Single-use is for downloads only, never streams. A stream is many requests; single-use would break playback. Streams use short expiry + user-binding instead. → §11, §12
- Watermarked file is NEVER in the feed response. Feed = clean playback URLs only. Watermarked download comes from a separate endpoint, minted on tap. → §12, §13
- Cache gotcha: never cache a signed object keyed by its signature. A cached object can outlive its stamp; private/paid content must bypass shared cache. → §10, §11
- Bot gate is the missing half of anti-scraping. Signed URLs stop permanent theft; the bot gate stops the act of mass-pulling. v1 had no bot gate section. → §14
- Open decision parked: NextGate's moat (content vs commerce) drives how hard to stamp public media and how heavy the bot gate needs to be. → §30
1. What is File Thunder?
File Thunder is NexGate's dedicated media processing engine — a standalone Spring Boot microservice responsible for all file and media operations across the platform.
Core principle: The Main Backend never touches raw files. It delegates all media operations to File Thunder and acts as a thin client.
Responsibilities: receive uploads, validate (type/size/quota), virus scan, transcode, generate variants (thumbnails, WebP, HLS, MP4), watermark, store in MinIO, serve via CDN, mint signed delivery URLs, publish media-ready events.
Not: AI/ML, recommendation, social graph, payments, auth, business logic.
2. The Four Wheels
FILE THUNDER
┌───────────┬───────────┬───────────┬───────────┐
│ FFmpeg │ ImageMagic│ ClamAV │ MinIO │
│ (video) │ (images) │ (security)│ (storage) │
└───────────┴───────────┴───────────┴───────────┘
transcode resize + virus object
HLS, w/mark WebP, blur scan store + CDN
- FFmpeg (Jaffree) — transcode, HLS segmenting, thumbnail/best-frame, 3s preview, watermark overlay, blur-pad, fMP4, format conversion, audio extract/normalize.
- ImageMagick (IM4Java) — resize variants, format→WebP, EXIF strip (privacy), auto-orient, dominant color, LQIP, OG image, GIF→WebP, PDF→preview. Plus
blurhash-java. - ClamAV — virus scan every upload (TCP
clamav:3310), hash-cache, digital products scanned twice. - MinIO — S3-compatible storage, presigned direct uploads, origin for Cloudflare, multipart + byte-range, lifecycle policies. Never publicly exposed (only Cloudflare + File Thunder reach it).
3. Storage Architecture (MinIO Buckets)
Rule: 4 buckets total. Never one bucket per user.
nexgate-raw/ ← temporary upload landing (auto-delete 24h)
nexgate-public/ ← social content (avatars, posts, stories, events, shops)
nexgate-private/ ← DMs, documents, KYC, temp audio
nexgate-digital/ ← purchase-gated products
Object Key Pattern
{bucket}/{domain}/{ownerId}/{entityId}/{fileId}/{variant}
nexgate-public/posts/usr_123/post_456/file_789/720p_clean.mp4
nexgate-private/messages/conv_abc/file_789/original.jpg
nexgate-digital/products/shop_xyz/prod_123/file_789/original/file.pdf
[v2 FIX] Owner-correct keys. The
{ownerId}segment MUST be the real owner of the file. v1 had cases where one user's avatar lived under a different user's prefix — that leaks identity and misattributes ownership. The owner in the key = the owner in the DB. Keys are otherwise opaque (no usernames, no PII).
nexgate-public layout (unchanged from v1)
profiles/{userId}/{fileId}/ avatar_400.webp, avatar_150.webp, avatar_50.webp, cover.webp
posts/{userId}/{fileId}/ {360,720,1080}p_clean.mp4, {360,720,1080}p_watermarked.mp4,
preview_3s.mp4, hls/master.m3u8, hls/{360,720,1080}p/seg_*.ts,
thumbnail.webp, og_clean.webp, og_play.webp, og_preview.mp4
stories/{userId}/{fileId}/ 720p_clean.mp4, thumbnail.webp
events/{accountId}/{eventId}/{fileId}/ banner.webp, banner_mobile.webp, banner_thumb.webp
shops/{shopId}/{productId}/{fileId}/ large.webp, medium.webp, thumb.webp
(See §12 for the watermarked-variant storage decision — store one rung, not all three.)
nexgate-private / nexgate-digital
As v1: messages, documents, kyc, temp audio (private); products with original/ (never
exposed) + preview/ + cover/ (digital).
4. Accepted File Formats
- Images accept JPEG/PNG/HEIC/WebP/GIF; reject BMP/TIFF/RAW/SVG (SVG = security risk). Output always WebP.
- Videos accept MP4/MOV/MKV/WEBM/AVI/3GP; reject WMV/FLV/VOB. Output H.264 MP4 (short) or HLS H.264 (long).
- Digital PDF, DOCX/XLSX/PPTX, GLB/OBJ/FBX/STL, MP3/WAV/FLAC, ZIP/RAR, PSD/AI, PNG.
5. Upload Flow
[1] Client-side intelligent compression (max 1080p, CRF 18 light, HEIC→JPEG)
[2] POST /media/upload-request { fileName, fileSize, mimeType, directory, clientMeta }
[3] File Thunder: validate MIME + size + quota (atomic SQL) → DB record PENDING
→ presigned MinIO URL (nexgate-raw) → { fileId, uploadUrl, expiresIn: 1800 }
[4] Client uploads directly to MinIO (TUS resumable / multipart >5MB)
[5] Client confirms POST /media/confirm { fileId } (cleanup job recovers if missed)
[6] Processing pipeline starts
client compress
│
▼
POST upload-request ──▶ validate + quota ──▶ presigned URL
│
▼
upload direct to MinIO (raw)
│
▼
confirm ──▶ ClamAV ──▶ dedup ──▶ process ──▶ READY
│
▼
RabbitMQ media.ready ──▶ Main Backend
Quota checked at presigned-URL time (atomic UPDATE ... WHERE used+size <= quota).
Duration checked AFTER upload via FFprobe (client value is a hint, not trusted) — over
limit → delete from raw, release quota, 403 DURATION_EXCEEDED.
Per-Upload Limits
| Plan | Image | Short Video | Long Video | Digital Video | Duration |
|---|---|---|---|---|---|
| FREE | 20MB | 200MB | ❌ | ❌ | 5 min |
| PRO | 20MB | 500MB | 2GB | 5GB | 60 min |
| BUSINESS | 20MB | 2GB | 5GB | 20GB | Unlimited |
6. Processing Pipelines (summary)
All pipelines: ClamAV scan → SHA-256 dedup check → process → store → delete raw → DB READY → publish events. Watermarking and all placeholders are generated here, at processing time, never at serve time.
6.1 Image
Auto-orient → strip EXIF (privacy) → NSFW check → dominant color + LQIP + BlurHash → size variants per content type (Post: 1600/800/300; Profile: 400/150/50; Product: 1000/500/200 square; Event: 1200×630/800×420) → WebP → OG image → store public → READY.
6.2 Short Video (MP4, duration < 3 min)
FFprobe → adaptive transcode decision (never upscale, never inflate) → fast lane (quick
360p → LIVE_PARTIAL) → full processing:
- Clean variants
{360,720,1080}p_clean.mp4(faststart). Non-9:16 reel-eligible → blur-pad to 9:16 (scale-fill+blur background, scale-fit sharp foreground, overlay). - Watermarked variant(s) — moving watermark (jumps corners every 3s; 80×80 logo 60% opacity + username; app-based label from
clientApp). [v2 FIX] store one download rung (720p), not all three (§12). - Extras:
thumbnail.webp(best-frame scored) + its LQIP/BlurHash/dominantColor,preview_3s.mp4(360p muted watermarked),og_clean.webp,og_play.webp,og_preview.mp4. - fMP4 (
frag_keyframe+empty_moov+faststart) for byte-range. - Extract
audio.wav→ nexgate-private (Rec Engine). - READY:
isReelEligible(<3min),streamingFormat: MP4, variants JSONB, aspectRatio, qualityScore, duration.
6.3 Long Video (HLS, duration ≥ 3 min)
Same as short EXCEPT: HLS adaptive (no fast lane — stays PROCESSING until done).
hls/{360,720,1080}p/ segments + master.m3u8. Segment = 2s, H.264 + AAC, MPEG-TS.
Still makes preview_3s, thumbnail, placeholders, OG. isReelEligible: false, streamingFormat: HLS.
6.4 Digital Products
Double ClamAV scan → SHA-256 checksum → light preview by type (PDF p1 watermarked,
video first 2min 480p watermarked, audio 30s 96kbps, 3D multi-angle thumbs, image 600px
watermarked) → store original/ (never exposed) + preview/ + cover/ → READY.
6.5 DM Attachments
nexgate-private, no CDN, conversation-membership check, thumb+original only, no watermark, no reel pool, signed 5-min URLs. Virus scanned like everything.
7. Watermarking Strategy
- Applied at processing time, once. Stored as separate
_watermarkedvariant. Never at serve time, never client-side. - Moving watermark (corners every 3s) — static is croppable, moving is not.
- Served: in-app playback → clean variant; download button → watermarked variant (§12). Direct CDN URL → clean (unavoidable; accepted, same tradeoff every platform has).
- Pre-watermarked uploads (from other platforms): Phase 1 = no detection, just overlay ours. Phase 3 = AI logo detection → suppress reach (Instagram model), never block.
- Username change: old videos keep old username (like TikTok).
[v2 FIX] Honest scope. A watermark does NOT prevent download — it survives download. Its job is attribution + traceability, not prevention. The clean file is reachable by a determined ripper via the playback URL; that's accepted. Goal = "not worth the effort for 99% + traceable if they do."
8. The Three Delivery Tiers [v2 FIX — core correction]
v1 assumed two levels (public unsigned / private signed). Correct model is three:
| Tier | Content | Login to view? | URL signed? | Expiry | CDN cache |
|---|---|---|---|---|---|
| 1 — trivial public | avatars, thumbnails, posters, blur previews | no | no | n/a | aggressive, 1y |
| 2 — public content | full post images, video rungs | no | YES | hours | careful (§10) |
| 3 — private / paid | DM media, private posts, digital products, KYC | yes | YES | minutes | none / private |
┌──────────────────────────────────────────────┐
│ TIER 1 trivial public │
│ avatars, thumbs → unsigned, cache 1y │
├──────────────────────────────────────────────┤
│ TIER 2 public content │
│ full images/video → SIGNED, hrs TTL │
├──────────────────────────────────────────────┤
│ TIER 3 private / paid │
│ DM, private, digital → SIGNED + auth + x-uid │
└──────────────────────────────────────────────┘
only diff T2 vs T3 = auth check before minting
The key insight: "public to view" and "signed URL" are independent. Tier 2 content is open to watch by anyone (no login), but the file is still signed — because signing is anti-scrape / anti-hotlink, not access control. This is exactly TikTok's model (public video, yet the raw file link expires and 403s).
The only difference between tier 2 and tier 3 in code is whether an auth/visibility check runs before minting — the signing itself is identical.
Tier 1 is the only unsigned tier. Anything a scraper would actually want (full media) is tier 2+, i.e. signed.
9. Signed URLs — how they actually work [v2 — new, the mechanism]
A signed URL = a normal link with a stamp made from three things:
stamp (x-sig) = HMAC(secret, path + expiry)
path— the object key being requestedexpiry(x-expires) — unix time the link diessecret— a key known ONLY to File Thunder/backend AND the CDN. Never in the URL, browser, or JSON. Lives in Vault.
Who does what, and when — three separate moments:
- Upload time: File Thunder makes the variants, stores them in MinIO. CDN not involved.
- Feed-build time (signing): backend computes
x-sigfrom(path + expiry + secret). Pure string math — no file opened, no CDN contacted, no storage touched. It's just writing a stamped link. (You can sign a path whose bytes don't even exist yet — the math only sees the string.) - Playback time (serving): the phone requests the stamped URL → CDN re-computes the stamp with its copy of the secret. Match + not expired → serve. First viewer: CDN lazily fetches the file from MinIO and caches it. Later viewers: served from cache.
upload feed build playback
┌────────┐ store ┌──────────┐ stamped ┌──────────┐
│ File │ ───────▶ │ Backend │ ─ link ──▶ │ CDN │
│ Thunder│ to MinIO │ signs │ │ validates│
└────────┘ └──────────┘ └────┬─────┘
secret ▲ │ first view
Vault └──────── same ────────┘ fetch+cache
secret from MinIO
Tamper resistance (why a stripped/edited link fails):
- No sig → CDN rejects (paths require a valid sig) → 403.
- Fake sig → recompute doesn't match → 403.
- Edited expiry → sig was computed from the old expiry, no longer matches → 403.
- Swapped path → sig bound to original path → 403.
- Untouched + in time → serve. After
x-expires→ even the real link 403s.
Forging anything requires the secret, which never leaves backend + CDN. Protect the secret (Vault) and the whole scheme holds.
You don't write the HMAC. The MinIO SDK (or CDN signer) does it in one call:
// ── Mint a signed delivery URL ──
String url = minioClient.getPresignedObjectUrl(
GetPresignedObjectUrlArgs.builder()
.method(Method.GET)
.bucket(bucket)
.object(objectKey)
.expiry(seconds, TimeUnit.SECONDS) // → x-expires + x-sig
.build());
10. URL Strategy
Tier 1 — trivial public → permanent, unsigned, cache hard
https://media.nexgate.co/profiles/usr_123/avatar_400.webp
https://media.nexgate.co/posts/usr_123/vid_789/thumbnail.webp
Cache-Control: public, max-age=31536000. Scraping these gains nothing.
Tier 2 — public content → signed, short-ish expiry
https://media.nexgate.co/posts/usr_123/vid_789/720p_clean.mp4?x-expires=1780297200&x-sig=8f3a9c
Tier 3 — private / paid → signed + user-bound + auth check first
https://media.nexgate.co/private/messages/conv_abc/file_789/original.jpg
?x-expires=...&x-sig=...&x-uid=usr_xyz
[v2 FIX] Domain is
.co. All URLs usemedia.nexgate.co. v1 mixed.com/.co— that produces broken links. One domain everywhere.
[v2 FIX] Cache-key gotcha. Never let the CDN cache a signed object using a cache key that includes
x-sig/x-uid— you'd fragment cache per user, and a cached object can outlive its stamp (origin checks expiry, but a cache hit never reaches origin). Rules: tier 1 cache hard; tier 2 cache by path only (strip query from cache key), short edge TTL; tier 3 never on shared cache.
Signed URL params
| Param | Purpose |
|---|---|
x-expires |
link dies after TTL |
x-sig |
HMAC tamper protection |
x-uid |
bound to user (tier 3) — shared link won't work for someone else |
Who generates / validates
Backend (has secret) generates. Cloudflare edge (same secret) validates. MinIO never seen by clients.
11. Streaming Token Strategy: MP4 vs HLS [v2 — new]
Expiry is checked when each request opens, not continuously. An in-flight transfer completes even if the clock passes mid-download. The two formats differ because of how many requests a single playback makes.
SHORT MP4 LONG HLS
one request hundreds of requests
┌──────────┐ ┌──────────┐ ┌──┬──┬──┬──┐
│ 720p.mp4 │ ← 1 signed URL │master.m3u│─▶│ts│ts│ts│..│
└──────────┘ few-hr TTL └──────────┘ └──┴──┴──┴──┘
opens once, short token long token
rides to end (~1 min) (> video len)
per-session one, shared
Short video (MP4) — one signed URL
One file = one request (or a few byte-ranges, all opened up front). A few-hour expiry is plenty; the in-flight request completes, so mid-play expiry basically never happens. → mint one signed URL per rung, few-hour TTL. No dual-token, no single-use.
Long video (HLS) — dual-token (or prefix-signed)
HLS = hundreds of segment requests (one every ~2s), so expiry is re-checked constantly. A single short token would die mid-video. Industry-standard fix (Google Media CDN / CloudFront):
master.m3u8→ short token (~1 min). Per-session entry gate. Fetched once at start.- child manifests +
.tssegments → long token (> full video length, up to ~1 day), carried by signed cookie or path-prefix signing (sign the folder, not each segment). - Segments stay cacheable (same bytes for everyone); only the manifest is per-session. Preserves cache hit rate.
At our stage on Cloudflare Free/Pro, do the simpler version: backend mints a long prefix-signed token for the whole
hls/folder when the post loads. Graduate to true edge dual-token (short→long exchange) if/when we move to a heavier CDN.
Single-use — downloads ONLY, never streams
A stream is many requests; making the token single-use would 403 the viewer one segment in.
Streams use short expiry + x-uid binding instead. Single-use (Redis) is correct only
for one-file-one-request downloads (§12, digital products).
Edge case (both formats)
User pauses past expiry, then seeks → player fires a fresh request with a dead URL → 403. Client safety net: catch the 403, silently re-fetch a fresh URL from backend, resume.
| Short MP4 | Long HLS | |
|---|---|---|
| Requests / play | one (few byte-ranges) | hundreds |
| Token | single signed URL | dual-token / prefix-signed |
| TTL | few hours | > watch duration |
| Single-use | no | no |
12. Watermark & Download Path [v2 FIX]
Decision: Option B — downloads are watermarked. (Full build, not the launch shortcut.)
The feed NEVER contains a watermarked URL. Feed = clean playback URLs only. Watermarked download is a separate endpoint, minted on tap.
User taps Download
→ GET /media/{fileId}/download?quality=720p (quality optional, default 720p)
→ backend: check canDownload (post setting) + tier (private/paid → auth check)
→ mint signed URL → {quality}_watermarked.mp4
short TTL (~10 min), x-uid bound, single-use (Redis x-once)
→ log download event (orderId/buyerId/ip/device/country if digital)
→ return { downloadUrl, expiresIn, singleUse: true }
Client downloads. URL marked used in Redis → 403 on reuse.
Default quality, no picker (short video). Serve one default rung (720p) — matches
TikTok/IG, sane on TZ data. Endpoint takes optional quality so a picker is an additive
change later (long-form only, where the size gap matters).
Storage decision: store one watermarked rung (720p), not all three. If a picker is added later, lazy-generate other rungs on first request and cache — only spend CPU/storage on a quality someone actually downloads.
Timing: watermarked 720p is generated at processing time (pre-ready), so download is instant — no first-user FFmpeg penalty.
// ── Download endpoint (picker-ready, single-use, watermarked) ──
@GetMapping("/media/{fileId}/download")
ResponseEntity<?> download(@PathVariable String fileId,
@RequestParam(defaultValue = "720p") String quality,
@AuthenticationPrincipal AccountEntity user) {
MediaEntity m = mediaService.findReady(fileId);
// ── access gate (tier 2 = open; tier 3 = ownership/visibility) ──
if (!m.isCanDownload()) return ResponseEntity.status(403).build();
if (m.isRestricted() && !accessService.canAccess(user, m))
return ResponseEntity.status(403).build();
// ── mint watermarked, user-bound, single-use URL ──
String key = m.watermarkedKey(quality); // lazy-generate if absent
String url = mediaService.mintDownloadUrl(key, user.getId(), Duration.ofMinutes(10));
downloadLog.record(fileId, user, request); // audit
return ResponseEntity.ok(Map.of("downloadUrl", url, "expiresIn", 600, "singleUse", true));
}
13. Feed Response Shape [v2 — new]
Rules baked in:
- Batch-mint inline. While building the page (already looping posts), mint every signed URL right there — no per-item round trip from the client.
- Store
objectKey, mint URL per request. Never persist a signed URL (it expires). - Tier 1 unsigned (avatar, thumb, poster, preview3s); tier 2 signed (full image, video rungs).
- Placeholders filled (blurhash, lqip, dominantColor) +
width/height/durationpresent. - No watermarked URL here (§12).
canDownloadflag tells the client whether to show the button.
{
"data": [{
"id": "0e67aef4-...",
"author": {
"id": "ad03431a-...", "userName": "juliusdev_", "verified": true,
"profilePictureUrl": "https://media.nexgate.co/profiles/ad03431a-.../avatar_400.webp"
},
"content": "Testing video media post 🎬",
"shareCode": "waBbP82h",
"media": [{
"id": "436bb62b-...",
"mediaType": "VIDEO",
"status": "READY",
"width": 1080, "height": 1920, "aspectRatio": "9:16", "duration": 28,
"isReelEligible": true,
"streamingFormat": "MP4",
"canDownload": true,
"dominantColor": "#1a1a2e",
"blurhash": "L6Pj0^i_.AyE_3t7t7R**0o#DgR4",
"lqip": "data:image/webp;base64,UklGRjoAAAB...",
"poster": "https://media.nexgate.co/posts/ad03431a-.../436bb62b-.../thumbnail.webp",
"preview3s": "https://media.nexgate.co/posts/ad03431a-.../436bb62b-.../preview_3s.mp4",
"variants": {
"360p": "https://media.nexgate.co/posts/ad03431a-.../436bb62b-.../360p_clean.mp4?x-expires=1780297200&x-sig=a1f2b3",
"720p": "https://media.nexgate.co/posts/ad03431a-.../436bb62b-.../720p_clean.mp4?x-expires=1780297200&x-sig=b2e3c4",
"1080p": "https://media.nexgate.co/posts/ad03431a-.../436bb62b-.../1080p_clean.mp4?x-expires=1780297200&x-sig=c3f4d5"
},
"order": 1
}],
"privacySettings": { "visibility": "PUBLIC" }
}]
}
(HLS video: streamingFormat: "HLS", replace variants with a single signed/prefix-signed master.m3u8.)
14. Security & Anti-Scraping [v2 — expanded]
14.1 Close the visibility-bypass hole
v1's permanent unsigned URLs meant a PRIVATE post's bytes were reachable by anyone with the link — the API hid the post, the file did not. Fix:
- Remove public-read from MinIO buckets (already a stated rule — enforce it).
- Route all access through signed CDN URLs.
- Private/followers/DM/paid → tier 3 (auth check before minting +
x-uid). After this, a hidden post's media returns 403 to anyone but the authorized viewer.
14.2 Two layers, both required
- Signed URLs stop permanent theft: scraped links die in hours → no permanent mirror, no hotlinking. They do NOT stop a one-time download — a valid link is a valid link.
- Bot gate stops the act of mass-pulling: it guards the feed API door so a bot can't cheaply pull fresh links over and over.
- Neither alone is enough. Together = scraping isn't worth it. (You cannot make it impossible — the goal is "not worth the effort for 99%, traceable if they do.")
14.3 Bot gate (the missing half) — sized to moat (§30)
- App attestation — the real app computes a hard-to-fake token (TikTok's
X-Bogus/msTokenequivalent). A plain script can't produce it without running app code. - Rate limiting — one "user" pulling thousands of posts/min = bot → throttle/block.
- Auth + behaviour — require login; flag inhuman scroll/pull patterns.
- Escalating friction — suspicious traffic → CAPTCHA / slowdown / ban.
Build depth depends on the moat decision (§30). Commerce-moat → lighter bot gate is fine for launch. Content-moat → invest earlier.
14.4 Protection layers (reference)
Content-Disposition: inline · dynamic URL via JS · signed expiring URLs · session/device
binding · disable right-click (frontend) · watermark on download · HLS chunking.
15. OG (Open Graph) Strategy
OG images + og_preview.mp4 are permanent, unsigned, tier 1 — because crawlers
(WhatsApp/Telegram) fetch on share and the user may open hours later; a signed URL would
expire → "Error loading." Access control for private posts is enforced server-side (the
post page returns 403), not via the OG asset. Serve og_clean.webp to known platforms,
og_play.webp (burned play button) to unknown crawlers (User-Agent detection).
16. Compression & Transcoding (reference)
Client: light pre-compress (max 1080p, CRF 18, never upscale). Server: FFprobe → adaptive
ladder (never upscale, never inflate → -c:v copy if output > input). CRF 21/23/25/28 for
1080/720/480/360, 30 for OG preview. 1080p max (all platforms serve ≤1080p anyway).
-movflags +faststart always; frag_keyframe+empty_moov+faststart for fMP4.
Now: H.264 + AAC. Future: AV1 + Opus + CMAF.
17. BlurHash, LQIP & Dominant Color
Generated for every image + video thumbnail, at processing time. Progressive load:
dominantColor (0ms) → BlurHash (0ms) → LQIP (0ms) → thumb.webp (CDN) → full on tap.
All three stored in media_content_signals and shipped inline in the feed JSON.
18. Deduplication
SHA-256 on raw file → file_hashes. Exists+clean → skip processing, reference existing
variants, reference_count++. Delete → decrement; 0 → remove from MinIO. Near-dup (pHash) =
future (copyright).
19. Quota Management
Enforced at presigned-URL generation (atomic SQL). Social = original size charged; digital = all stored bytes. Plans: FREE 1GB / PRO 20GB / BUSINESS 100GB. Warn 80%/95%/100%. Release on abandoned (immediate) and hard-delete (30d after soft). Redis cache, sync to PG every 5 min.
20. Storage Lifecycle
raw: auto-delete 24h. public: keep forever, access-based tiering (hot NVMe → warm HDD → cold; viral old content auto-promotes). private: DM 1y, audio 24h, KYC per law. digital: forever. Soft delete → 30d grace → hard delete + Cloudflare purge + quota free.
21. Abandoned Upload Handling
Layer 1: client confirm (fast path). Layer 2: cleanup job every 30 min (PENDING >1h → file
exists = recover/process; not = abandoned, release quota). Layer 3: raw lifecycle 24h.
fileId in object key enables recovery without confirm. Redis distributed lock on the job.
22. Progress Tracking (SSE)
SSE (one-way, auto-reconnect, resilient on TZ networks). Redis key upload_status:{fileId}:
PENDING→UPLOADING→UPLOADED→SCANNING→PROCESSING→LIVE_PARTIAL→READY (or FAILED/QUARANTINED/
ABANDONED). Multi-instance via Redis Pub/Sub bridging emitters.
23. Communication Architecture
No direct HTTP between services. RabbitMQ (FT ↔ Main Backend, topic nexgate.media):
media.upload.request, media.upload.url.ready, media.ready, media.live.partial,
media.failed, media.quarantined, digital.product.ready. Kafka (FT → Rec Engine /
Analytics, topic content.new).
[v2 FIX] Events carry
objectKeys + variants, not signed URLs.MEDIA_READYships storage keys (e.g.posts/usr_123/vid_789/720p_clean.mp4) + placeholders. The Main Backend mints signed URLs per feed request from those keys — it never stores a signed URL.
24. CDN Strategy
Cloudflare in front of MinIO. First request → edge → MinIO → cache → serve; thereafter from
cache. Cache-Control: tier 1 = 1y; tier 2 = path-keyed short edge TTL; tier 3 / API =
private, no-store. MinIO never publicly exposed (only Cloudflare IPs). Phases: Free →
Pro ($20) → R2 (zero egress). See §10 cache-key gotcha.
25. Shareable Links
Client-side nexgate.co/reels/{shareCode}?ref=whatsapp. No uid in URL (server resolves
owner from shareCode → follow prompt). ref drives share analytics. Public content = no
token needed for the page; the media on the page still follows tier rules (§8).
26. Technology Stack
Java 21 / Spring Boot 3.x · FFmpeg (Jaffree) · ImageMagick (IM4Java) · blurhash-java ·
ClamAV (Docker, TCP 3310) · MinIO · Cloudflare · RabbitMQ · Kafka · Redis · PostgreSQL ·
TUS · Vault (vault.qbitspark.com) — holds the signing secret · Docker Compose · SSE.
27. Database Schema (key tables + v2 notes)
media_files (
file_id UUID PK, owner_id UUID, directory ENUM, original_name TEXT,
object_key TEXT, -- canonical storage key [v2: keys, never signed URLs]
mime_type TEXT, file_size BIGINT,
status ENUM, -- PENDING..READY..QUARANTINED..ABANDONED
variants JSONB, -- variant object_keys (NOT URLs)
metadata JSONB, scan_result TEXT, hash TEXT,
is_reel_eligible BOOLEAN, streaming_format ENUM,
can_download BOOLEAN DEFAULT true, -- [v2] drives feed canDownload + download gate
created_at TIMESTAMPTZ, ready_at TIMESTAMPTZ
);
media_content_signals (
file_id UUID FK, has_audio BOOLEAN, audio_ref TEXT,
dominant_color TEXT, blurhash TEXT, lqip TEXT,
aspect_ratio TEXT, quality_score INT, width INT, height INT, duration_seconds DECIMAL
);
-- user_storage_quota, file_hashes, media_variants (HOT/WARM/COLD),
-- digital_product_files, digital_orders, digital_download_logs (as v1)
Indexes: idx_media_cleanup (created_at) WHERE status='PENDING'; unique file_hashes(hash);
media_files(owner_id, status, created_at DESC).
[v2 FIX] No
signed_urlcolumn anywhere. Signed URLs are minted per request and discarded. The DB stores onlyobject_key+variants(keys).
28. Docker & Infrastructure (reference)
eclipse-temurin:21 + ffmpeg + imagemagick(+webp/heic). ClamAV separate container
(daemon, auto-updates). FT: 4 cores / 16GB (FFmpeg CPU-heavy). ClamAV: 1 core / 2GB.
Env: PG, Redis, RabbitMQ, Kafka, MinIO, ClamAV, VAULT_ADDR.
29. What File Thunder Does NOT Do
❌ ML/AI classification · Whisper (Rec Engine) · CLIP · feed ranking · social graph ·
payments · auth · push · search · business logic · fan-out · recommendations.
Extracts audio.wav temporarily; Rec Engine fetches/transcribes/deletes. FT never knows
audio content.
30. Open Decisions (parked)
- Moat: content vs commerce? This drives §14.3 (bot-gate depth) and §8 (how hard to stamp tier-2 public media). If commerce is the moat (shops/payments/events), public media scraping hurts less → lighter bot gate, tier-2 stamping is softer/optional. If content is the moat → invest in bot gate + stamp tier-2 firmly. Decide before finalizing anti-scraping work.
- Bot gate depth for launch — minimum: login + rate limit. Full app-attestation later, sized to (1).
- HLS dual-token vs prefix-sign for launch — prefix-sign on Cloudflare now; edge dual-token if/when heavier CDN (§11).
- Download quality picker — default 720p now; picker + lazy-gen later (§12).
Summary — File Thunder v2 in one line
Receives raw files → ClamAV → processes into clean + watermarked variants (FFmpeg/IM) with blurhash/lqip/dominant-color → stores object keys across 4 buckets in MinIO (never public-read) → the Main Backend mints signed delivery URLs per request from those keys using a three-tier policy (trivial-public unsigned, public-content signed, private/paid signed + auth) → Cloudflare serves and caches by tier → streams use MP4-single-URL or HLS-dual-token, downloads use single-use watermarked URLs → signed URLs + a bot gate together make scraping not worth the effort — all without business logic, ML, or social ops.
File Thunder Architecture Guide v2.0 — NexGate / QBIT SPARK v2 integrates media-security design review. [v2 FIX] tags mark every change from v1.