Nexgate Recommendation System - Architecture Documentation

Nexgate Recommendation System - Architecture Documentation

Version 1.0 | Social Commerce Platform


๐Ÿ“‘ Table of Contents

  1. Executive Summary
  2. System Overview
  3. Recommendation Strategies
  4. Embedding System
  5. Feed Algorithm
  6. Search System
  7. Group Buy Recommendations
  8. Installment Recommendations
  9. Technology Stack
  10. Implementation Phases
  11. Performance & Scaling
  12. Metrics & Monitoring

๐ŸŽฏ Executive Summary

What is Nexgate?

Nexgate is a social commerce platform that combines:

Core Philosophy

"Discovery over Search"

Recommendation System Goals

  1. Maximize Engagement: Keep users scrolling and interacting
  2. Drive Conversions: Turn views into purchases
  3. Build Community: Connect buyers through group purchases
  4. Enable Affordability: Help users buy premium products through installments
  5. Provide Value: Show relevant products at the right time

Key Success Metrics


๐Ÿ—๏ธ System Overview

High-Level Architecture

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                      USER INTERFACE                          โ”‚
โ”‚                                                              โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”   โ”‚
โ”‚  โ”‚  Feed    โ”‚  โ”‚ Product  โ”‚  โ”‚  Search  โ”‚  โ”‚  Group   โ”‚   โ”‚
โ”‚  โ”‚  (Main)  โ”‚  โ”‚  Detail  โ”‚  โ”‚ (Text/   โ”‚  โ”‚   Buy    โ”‚   โ”‚
โ”‚  โ”‚          โ”‚  โ”‚   Page   โ”‚  โ”‚ Visual)  โ”‚  โ”‚  Deals   โ”‚   โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜   โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                         โ”‚
                         โ”‚ REST API / WebSocket
                         โ†“
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚               SPRING BOOT BACKEND (Main)                     โ”‚
โ”‚                                                              โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”   โ”‚
โ”‚  โ”‚          RECOMMENDATION ENGINE LAYER                 โ”‚   โ”‚
โ”‚  โ”‚                                                      โ”‚   โ”‚
โ”‚  โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”   โ”‚   โ”‚
โ”‚  โ”‚  โ”‚   Feed    โ”‚  โ”‚  Search   โ”‚  โ”‚  Group Buy   โ”‚   โ”‚   โ”‚
โ”‚  โ”‚  โ”‚  Service  โ”‚  โ”‚  Service  โ”‚  โ”‚   Service    โ”‚   โ”‚   โ”‚
โ”‚  โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜   โ”‚   โ”‚
โ”‚  โ”‚                                                      โ”‚   โ”‚
โ”‚  โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”   โ”‚   โ”‚
โ”‚  โ”‚  โ”‚Similarity โ”‚  โ”‚Installmentโ”‚  โ”‚  Engagement  โ”‚   โ”‚   โ”‚
โ”‚  โ”‚  โ”‚  Service  โ”‚  โ”‚  Service  โ”‚  โ”‚   Tracker    โ”‚   โ”‚   โ”‚
โ”‚  โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜   โ”‚   โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜   โ”‚
โ”‚                                                              โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”   โ”‚
โ”‚  โ”‚            CORE BUSINESS SERVICES                    โ”‚   โ”‚
โ”‚  โ”‚                                                      โ”‚   โ”‚
โ”‚  โ”‚  Product Service | User Service | Order Service     โ”‚   โ”‚
โ”‚  โ”‚  Social Graph | Embedding Client | Payment Service  โ”‚   โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜   โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                         โ”‚
        โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
        โ”‚                โ”‚                โ”‚
        โ†“                โ†“                โ†“
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ PostgreSQL   โ”‚  โ”‚    Redis    โ”‚  โ”‚    Python      โ”‚
โ”‚ + pgvector   โ”‚  โ”‚   (Cache)   โ”‚  โ”‚   Embedding    โ”‚
โ”‚              โ”‚  โ”‚             โ”‚  โ”‚    Service     โ”‚
โ”‚ โ€ข Products   โ”‚  โ”‚ โ€ข Sessions  โ”‚  โ”‚                โ”‚
โ”‚ โ€ข Users      โ”‚  โ”‚ โ€ข Feed      โ”‚  โ”‚ โ€ข Sentence     โ”‚
โ”‚ โ€ข Orders     โ”‚  โ”‚ โ€ข Seen Postsโ”‚  โ”‚   Transformers โ”‚
โ”‚ โ€ข Posts      โ”‚  โ”‚ โ€ข Hot Data  โ”‚  โ”‚ โ€ข CLIP Model   โ”‚
โ”‚ โ€ข Embeddings โ”‚  โ”‚             โ”‚  โ”‚                โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Component Responsibilities

Frontend (Mobile/Web):

Spring Boot Backend:

PostgreSQL + pgvector:

Redis:

Python Embedding Service:


๐Ÿ“Š Recommendation Strategies Overview

Strategy Matrix

Feature Strategy Uses Embeddings Complexity Priority
Main Feed Hybrid Scoring No Medium P0 (Critical)
Product Similarity Vector Similarity Yes (Text + Image) High P1 (Important)
Text Search Semantic Matching Yes (Text) High P1 (Important)
Visual Search Image Matching Yes (Image) High P2 (Nice to have)
Group Buy Match Interest + Urgency Partial Medium P0 (Critical)
Group Invites Social Graph No Low P1 (Important)
Installment Suggest Budget Analysis No Low P0 (Critical)
Trending Engagement Metrics No Low P0 (Critical)
Collaborative Purchase Patterns No Medium P1 (Important)

When to Use What

Use Scoring System (No Embeddings):

Use Embeddings:

Use Collaborative Filtering:

Use Social Graph:


๐ŸŽจ Detailed Recommendation Strategies

1. Main Feed Algorithm (Primary Experience)

Purpose: Personalized social commerce feed (like TikTok Shop/Instagram Shopping)

Composition Strategy:

Feed Mix (Per 20 posts):
โ”œโ”€โ”€ 60% Posts from Following (12 posts)
โ”‚   โ€ข Sellers user follows
โ”‚   โ€ข Friends' posts
โ”‚   โ€ข Brands user engaged with
โ”‚
โ”œโ”€โ”€ 25% Trending in User's Categories (5 posts)
โ”‚   โ€ข Hot products in categories user likes
โ”‚   โ€ข Viral posts with high engagement
โ”‚   โ€ข New products gaining traction
โ”‚
โ”œโ”€โ”€ 10% Local Sellers (2 posts)
โ”‚   โ€ข Sellers in same city/region
โ”‚   โ€ข Faster delivery options
โ”‚   โ€ข Support local businesses
โ”‚
โ””โ”€โ”€ 5% Sponsored/Promoted (1 post)
    โ€ข Paid advertising
    โ€ข Featured products
    โ€ข Platform partnerships

Scoring Formula:

Each post receives a score from 0-100 based on weighted factors:

Post Score = 
    (35% ร— Social Score) +
    (25% ร— Engagement Prediction Score) +
    (20% ร— Personalization Score) +
    (15% ร— Recency Score) +
    (5% ร— Quality Score)

Factor Breakdown:

A. Social Score (35 points max)

Following Seller: +15 points
Friends Engaged (per friend): +2 points (max 10)
Seller Verified: +5 points
Seller Rating (0-5 stars): +0 to 5 points

B. Engagement Prediction Score (25 points max)

Category Match (user's top 3 categories): +10 points
Price in User's Range: +8 points
Similar to Past Likes: +7 points

C. Personalization Score (20 points max)

Matches Recent Searches (per keyword): +2 points (max 8)
Local Seller (same city): +7 points
Matches User Preferences: +5 points

D. Recency Score (15 points max)

< 1 hour old: 15 points
< 6 hours old: 12 points
< 24 hours old: 8 points
< 3 days old: 4 points
Older: 1 point

E. Quality Score (5 points max)

3+ Images: +2 points
Detailed Description (>100 chars): +1 point
High Engagement Rate (>10%): +2 points

Feed Generation Process:

Step 1: Gather Candidates
โ”œโ”€โ”€ Query posts from following (last 7 days)
โ”œโ”€โ”€ Query trending posts in user's categories
โ”œโ”€โ”€ Query local sellers' posts
โ””โ”€โ”€ Query sponsored posts
โ†’ Result: ~100-200 candidate posts

Step 2: Filter
โ”œโ”€โ”€ Remove posts user already saw (Redis cache)
โ”œโ”€โ”€ Remove out-of-stock products
โ”œโ”€โ”€ Remove blocked/reported sellers
โ””โ”€โ”€ Remove duplicate products
โ†’ Result: ~80-150 eligible posts

Step 3: Score Each Post
โ”œโ”€โ”€ Calculate social score
โ”œโ”€โ”€ Calculate engagement prediction
โ”œโ”€โ”€ Calculate personalization
โ”œโ”€โ”€ Calculate recency
โ””โ”€โ”€ Calculate quality
โ†’ Result: Each post has score 0-100

Step 4: Rank & Mix
โ”œโ”€โ”€ Sort by score (highest first)
โ”œโ”€โ”€ Apply diversity rules:
โ”‚   โ€ข Max 3 consecutive posts from same seller
โ”‚   โ€ข Max 5 posts from same category in 20
โ”‚   โ€ข Insert 1 sponsored post at position 7-10
โ””โ”€โ”€ Apply freshness shuffle (boost 2-3 recent high-quality posts)
โ†’ Result: Final ranked list

Step 5: Pagination
โ”œโ”€โ”€ Return top 20 posts
โ”œโ”€โ”€ Generate cursor (ID of 20th post)
โ”œโ”€โ”€ Cache seen post IDs (Redis, 30 days TTL)
โ””โ”€โ”€ Track impressions for analytics

Optimization Strategies:

Caching:

Pre-computation:

Feed Diversity:

Diversity Rules:
โ€ข No more than 3 consecutive posts from same seller
โ€ข Category distribution: At least 3 different categories in top 10
โ€ข Price variance: Mix of low/mid/high price points
โ€ข Content type mix: Products, tutorials, reviews, testimonials

2. Product Similarity (Vector Embeddings)

Purpose: Show similar products on product detail page

Where Used:

How It Works:

Text-Based Similarity:

Product: "Red Nike Running Shoes"
โ†“
Generate text embedding from:
"Red Nike Running Shoes. Comfortable athletic footwear 
 designed for daily training and jogging. Category: Footwear."
โ†“
Embedding: [0.23, 0.87, 0.12, ... 384 numbers]
โ†“
Store in product.text_embedding column
โ†“
When user views product:
โ†’ Compare this embedding with all other products
โ†’ Find 10 closest matches using cosine similarity
โ†’ Display as "Similar Products"

Image-Based Similarity:

Product Image: [actual shoe photo]
โ†“
Download image and process pixels
โ†“
Generate image embedding using CLIP:
[0.91, 0.12, 0.45, ... 512 numbers]
โ†“
Store in product.image_embedding column
โ†“
When user views product:
โ†’ Compare this embedding with all other products
โ†’ Find 10 visually similar matches
โ†’ Display as "Visually Similar"

Combined Similarity:

Option to blend both:

Combined Score = (0.6 ร— Text Similarity) + (0.4 ร— Image Similarity)

Example:
Product A vs Product B:
โ€ข Text similarity: 0.85 (both describe athletic shoes)
โ€ข Image similarity: 0.72 (both look sporty)
โ€ข Combined: (0.6 ร— 0.85) + (0.4 ร— 0.72) = 0.51 + 0.29 = 0.80

Result: 80% similarity โ†’ Good recommendation!

When to Use Which:

Text Embedding (Semantic Similarity):

Image Embedding (Visual Similarity):

Performance Considerations:

Similarity Search Performance:
โ”œโ”€โ”€ Without index: ~2-5 seconds (scan all products)
โ”œโ”€โ”€ With ivfflat index: ~50-200ms (much faster!)
โ””โ”€โ”€ With caching: ~10-30ms (best!)

Optimization:
โ€ข Create pgvector index on embedding columns
โ€ข Cache similar products per product (24 hour TTL)
โ€ข Pre-compute similarities for popular products
โ€ข Limit search to same category first (faster)

3. Search System

Purpose: Help users find products through text or image queries

A. Text Search (Semantic)

Traditional Keyword Search (What we're improving):

User searches: "comfortable work shoes"
โ†“
System finds: Products containing words "comfortable" AND "work" AND "shoes"
โ†“
Problems:
โ€ข Misses "office footwear" (different words, same meaning)
โ€ข Misses "business casual sneakers" (related but no exact match)
โ€ข Can't understand intent or context

Semantic Search (With Embeddings):

User searches: "comfortable work shoes"
โ†“
Generate embedding for query:
"comfortable work shoes" โ†’ [0.45, 0.78, 0.23, ...]
โ†“
Compare with ALL product text embeddings
โ†“
Find closest matches:
โ€ข "Business casual sneakers" โœ… (similarity: 0.89)
โ€ข "Office-appropriate footwear" โœ… (similarity: 0.86)
โ€ข "Professional dress shoes" โœ… (similarity: 0.82)
โ†“
Understands meaning, not just keywords!

Search Process:

Step 1: User Types Query
"gift for dad who likes tech"
โ†“
Step 2: Generate Query Embedding
Call Python service: "gift for dad who likes tech"
โ†’ Returns: [0.34, 0.67, 0.91, ... 384 numbers]
โ†“
Step 3: Similarity Search
Compare query embedding with product text embeddings
Use cosine similarity to find closest matches
โ†“
Step 4: Ranking & Filtering
โ€ข Base ranking: Similarity score
โ€ข Boost: In stock items (+10%)
โ€ข Boost: Popular items (+5%)
โ€ข Boost: New arrivals (+5%)
โ€ข Filter: Remove out of budget (if known)
โ†“
Step 5: Return Results
Top 20 most relevant products

Search Enhancements:

Query Understanding:
โ”œโ”€โ”€ Intent detection: "gift" โ†’ Show gift-appropriate items
โ”œโ”€โ”€ Context: "for dad" โ†’ Skew towards masculine products
โ”œโ”€โ”€ Category hint: "tech" โ†’ Filter to electronics/gadgets
โ””โ”€โ”€ Budget inference: Previous searches/purchases

Ranking Factors:
โ”œโ”€โ”€ Semantic relevance (primary): 60%
โ”œโ”€โ”€ Popularity/sales: 20%
โ”œโ”€โ”€ Recency (new products): 10%
โ”œโ”€โ”€ User's past preferences: 10%

B. Visual Search (Image Upload)

Purpose: Find products by uploading a photo

How It Works:

Step 1: User Sees Product in Real Life
User sees cool shoes on Instagram/street
Takes photo or saves image
โ†“
Step 2: Upload to Nexgate
User opens app โ†’ Camera icon โ†’ Upload photo
โ†“
Step 3: Generate Image Embedding
Send image bytes to Python CLIP service
Process actual pixels (not URL or filename!)
โ†’ Returns: [0.91, 0.23, 0.67, ... 512 numbers]
โ†“
Step 4: Find Visually Similar Products
Compare upload embedding with all product image embeddings
Use cosine similarity
โ†“
Step 5: Return Results
"Found 20 similar items"
Show visually matching products
Include similarity % for transparency

Visual Search Use Cases:

Use Case 1: "Find This Exact Product"
User uploads: Photo of specific Nike shoe
System finds: That exact shoe (if in catalog)
Or: Visually identical alternatives

Use Case 2: "Find Similar Style"
User uploads: Minimalist white sneakers
System finds: All minimal white shoe styles
From different brands

Use Case 3: "Shop the Look"
User uploads: Full outfit photo
System identifies: Individual items
Shows: Matching products for each piece

Use Case 4: "Find Better Price"
User uploads: Expensive designer bag
System finds: Similar looking bags
At various price points

Performance:

Visual Search Speed:
โ”œโ”€โ”€ Image upload: ~1 second
โ”œโ”€โ”€ Embedding generation: ~200-300ms
โ”œโ”€โ”€ Similarity search: ~100-200ms (with index)
โ””โ”€โ”€ Total: ~1.5 seconds (acceptable!)

Optimization:
โ€ข Compress uploaded images (reduce transfer time)
โ€ข Process images at consistent size (224x224 for CLIP)
โ€ข Cache embeddings for popular uploaded images
โ€ข Use GPU for embedding generation (5x faster)

4. Group Buy Recommendations

Purpose: Connect users with active group purchases and maximize group completion rates

A. Find Groups to Join

Scenario:

User viewing: iPhone 15 ($999)
User thinking: "Too expensive..."
โ†“
System shows:
"๐Ÿ’ก Join a group buy and save $150!
 ๐Ÿ”ฅ 3 active iPhone groups:
 โ€ข Group A: 8/10 people, 3 hours left โ†’ JOIN
 โ€ข Group B: 5/10 people, 24 hours left โ†’ JOIN
 โ€ข Group C: 2/10 people, 48 hours left โ†’ JOIN"

Recommendation Logic:

Finding Relevant Groups:

Step 1: Identify User's Interest
User action: Viewed/searched/added to cart iPhone
Interest level: High
โ†“
Step 2: Find Active Groups
Query all groups:
โ€ข Product: iPhone 15 (exact match)
โ€ข Status: Active (not expired/completed)
โ€ข Not full yet (has open spots)
โ†“
Step 3: Score Each Group
Group Score = 
    (0.40 ร— Product Match) +
    (0.30 ร— Urgency/Time Left) +
    (0.20 ร— Fill Rate) +
    (0.10 ร— Discount Size)

Example:
Group A:
โ€ข Product Match: 100% (exact iPhone)
โ€ข Urgency: 90% (3 hours left = high urgency)
โ€ข Fill Rate: 80% (8/10 = almost there!)
โ€ข Discount: 15% ($150 savings)
Score = (0.40ร—100) + (0.30ร—90) + (0.20ร—80) + (0.10ร—15)
     = 40 + 27 + 16 + 1.5 = 84.5 โญ TOP RECOMMENDATION!

Group C:
โ€ข Product Match: 100%
โ€ข Urgency: 20% (48 hours = low urgency)
โ€ข Fill Rate: 20% (2/10 = just started)
โ€ข Discount: 15%
Score = (0.40ร—100) + (0.30ร—20) + (0.20ร—20) + (0.10ร—15)
     = 40 + 6 + 4 + 1.5 = 51.5

โ†“
Step 4: Rank & Display
Show Group A first (highest score + urgency)
Display countdown timer
Show social proof (8 people already joined!)

Similar Product Groups (Using Embeddings):

Scenario: User wants iPhone but no active iPhone groups

System finds:
"No active iPhone groups, but:
 ๐Ÿ“ฑ Samsung S24 Group: 6/10, 5 hours left โ†’ JOIN
 ๐Ÿ“ฑ Google Pixel Group: 4/8, 12 hours left โ†’ JOIN"

How: Use text embeddings to find similar products
iPhone embedding โ‰ˆ Samsung/Google phone embeddings
Show groups for similar high-end smartphones

Urgency Triggers:

Smart Notifications:

When group is 80% full:
โ†’ "โšก Only 2 spots left in iPhone group! Join now!"

When group has < 6 hours left:
โ†’ "โฐ Group closes in 5 hours! Don't miss 15% off!"

When user's friend joins:
โ†’ "๐Ÿ‘ค Sarah just joined iPhone group! Join her?"

When price drops to user's budget:
โ†’ "๐Ÿ’ฐ Now $849 with group buy - fits your budget!"

B. Invite Friends to Your Group

Scenario:

User creates group: "MacBook Pro - Need 10 people"
Current: 3/10 members (user + 2 others)
โ†“
System suggests who to invite

Invitation Recommendation Logic:

Finding Best Invitees:

Step 1: Candidate Pool
โ”œโ”€โ”€ User's followers
โ”œโ”€โ”€ User's friends (mutual follows)
โ”œโ”€โ”€ People user previously bought with
โ””โ”€โ”€ Active users who viewed similar products

Step 2: Score Each Potential Invitee
Invite Score = 
    (0.35 ร— Product Interest) +
    (0.25 ร— Social Connection) +
    (0.20 ร— Past Group Activity) +
    (0.15 ร— Budget Match) +
    (0.05 ร— Platform Activity)

Example - Inviting John:
โ€ข Product Interest: 90% (viewed MacBooks 3 times this month)
โ€ข Social Connection: 80% (mutual friend, engaged with user's posts)
โ€ข Past Group Activity: 70% (joined 2 groups before, completed both)
โ€ข Budget Match: 85% (bought items in $800-1500 range)
โ€ข Platform Activity: 95% (logs in daily, active buyer)

Score = (0.35ร—90) + (0.25ร—80) + (0.20ร—70) + (0.15ร—85) + (0.05ร—95)
     = 31.5 + 20 + 14 + 12.75 + 4.75 = 83 โญ EXCELLENT MATCH!

โ†“
Step 3: Personalized Invitation Message
"Invite John:
 ๐Ÿ’ก He viewed MacBook Pro recently
 ๐Ÿ‘ฅ Mutual friend with Sarah (already in group)
 โญ Reliable (completed 2 past groups)"

[SEND INVITE]

Social Proof in Invitations:

Smart Messaging:

To close friend:
"Hey! I'm buying a MacBook with 9 others. Join us?
 We save $200 each! Only 1 spot left."

To network connection:
"Group buy for MacBook Pro
 3 of your friends already joined
 Save 15% | 6 spots left"

To stranger (similar interests):
"MacBook Pro group buy
 Popular in tech community
 Join 9 others | Save $200"

C. Complete the Group (Urgency Marketing)

Scenario:

Group status: 8/10 members, 4 hours left
Need: 2 more people to unlock deal
Risk: Group expires without completion

Completion Strategy:

System Actions:

Action 1: Broadcast to High-Intent Users
Target:
โ”œโ”€โ”€ Users who viewed this product
โ”œโ”€โ”€ Users with product in cart
โ”œโ”€โ”€ Users who joined similar groups
โ””โ”€โ”€ Friends of current members

Notification:
"โšก URGENT: 2 spots left!
 iPhone group closes in 4 hours
 Join now and save $150"

Action 2: Discount Boost (If needed)
If group stuck at 80% for 2+ hours:
โ†’ Platform adds extra 5% discount
โ†’ "New deal: Save $150 โ†’ Save $170!"
โ†’ Creates momentum

Action 3: Social Pressure
Show current members:
"Your group needs 2 more people
 Share with friends to complete deal
 [SHARE ON WHATSAPP] [SHARE ON INSTAGRAM]"

Action 4: Fallback Options
If group fails:
โ†’ "Sorry, group didn't complete
   ๐Ÿ’ก Join this similar group instead: [...]
   OR: Get notified when new iPhone group starts"

Group Recommendation Priorities:

Priority Matrix:

Urgent + Almost Full = Highest Priority
โ”œโ”€โ”€ 90% full + < 6 hours left
โ”œโ”€โ”€ Show on feed prominently
โ”œโ”€โ”€ Push notifications
โ””โ”€โ”€ Email reminders

High Interest Match = High Priority
โ”œโ”€โ”€ Exact product user wants
โ”œโ”€โ”€ Similar products (embeddings)
โ”œโ”€โ”€ Category match
โ””โ”€โ”€ Show in search results

Social Connection = Medium Priority
โ”œโ”€โ”€ Friends in group
โ”œโ”€โ”€ Followed sellers
โ”œโ”€โ”€ Past group partners
โ””โ”€โ”€ Show in "Friends' Activity"

General Discovery = Low Priority
โ”œโ”€โ”€ Trending groups
โ”œโ”€โ”€ Category exploration
โ”œโ”€โ”€ New group announcements
โ””โ”€โ”€ Show in feed occasionally

5. Installment Recommendations

Purpose: Make expensive products affordable through payment plans

Key Principle: Show the right payment option at the right moment

A. "You Can Afford This!"

Scenario:

User profile:
โ€ข Average purchase: $50-200
โ€ข Max purchase: $400
โ€ข Never bought > $600

User views: MacBook Pro ($1,299)
Typical reaction: "Too expensive, can't afford"

Installment Intervention:

System detects:
Product price ($1,299) > User max purchase ($400) ร— 3
โ†“
Calculate monthly payment:
$1,299 รท 12 months = $108/month
โ†“
Check affordability:
$108/month < User max purchase ($400)? โœ… YES
โ†“
Show prominent banner:
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ ๐Ÿ’ก You can afford this!             โ”‚
โ”‚ Pay just $108/month for 12 months   โ”‚
โ”‚ โ€ข No interest                        โ”‚
โ”‚ โ€ข Pay ahead anytime                  โ”‚
โ”‚ โ€ข Cancel anytime                     โ”‚
โ”‚ [SEE PAYMENT OPTIONS]                โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Affordability Logic:

Trigger Rules:

Show Installment When:
1. Product price > (User's average purchase ร— 3)
2. Monthly payment < (User's max purchase ร— 1.5)
3. Product in user's interested categories
4. User has good payment history (if existing customer)

Example:
Product: $1,299
User avg: $150
User max: $400

Check 1: $1,299 > ($150 ร— 3) = $1,299 > $450 โœ…
Check 2: $108/month < ($400 ร— 1.5) = $108 < $600 โœ…
โ†’ SHOW INSTALLMENT OPTION!

Don't Show If:
โ€ข Product too cheap (< $200)
โ€ข User can afford full price easily
โ€ข Monthly payment still too high for user

Payment Plan Recommendations:

Personalized Plans:

For Budget-Conscious User:
"12 months at $108/month" (smallest monthly)

For Established User:
"6 months at $216/month" (faster payoff)

For Premium User:
"3 months at $433/month" (minimize interest duration)

Dynamic Suggestion:
Based on user's typical payment speed:
โ€ข Pays bills early โ†’ Suggest shorter term
โ€ข Budget constrained โ†’ Suggest longer term
โ€ข First-time โ†’ Suggest flexible middle option

B. Premium Product Feed

Scenario:

User interested in: Laptops
User budget: $300-500
Problem: Premium laptops cost $1000+

Solution - Installment Feed Section:

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ ๐Ÿ’Ž Premium Products - Easy Payments โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ MacBook Pro                         โ”‚
โ”‚ $1,299 โ†’ Just $108/month            โ”‚
โ”‚ โญ 4.9 stars | ๐Ÿ”ฅ Hot deal          โ”‚
โ”‚ "Fits your monthly budget!"         โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ Dell XPS 15                         โ”‚
โ”‚ $1,199 โ†’ Just $100/month            โ”‚
โ”‚ โญ 4.8 stars | ๐Ÿ’ผ Professional      โ”‚
โ”‚ "Popular in your network"           โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Feed Scoring for Installment Products:

Installment Product Score =
    (0.40 ร— Category Interest) +
    (0.30 ร— Affordability with Installment) +
    (0.20 ร— Product Quality) +
    (0.10 ร— Social Proof)

Example - MacBook for User:
โ€ข Category: 95% (loves tech, views laptops often)
โ€ข Affordability: 90% ($108/month fits budget)
โ€ข Quality: 98% (4.9 star rating, premium brand)
โ€ข Social: 75% (3 friends own MacBooks)

Score = (0.40ร—95) + (0.30ร—90) + (0.20ร—98) + (0.10ร—75)
     = 38 + 27 + 19.6 + 7.5 = 92.1 โญโญโญ

This scores higher than cheaper laptops because:
โ€ข Installments make it affordable
โ€ข Matches user's aspirations
โ€ข Higher quality product

C. Upgrade Suggestions

Scenario:

User adding to cart: iPhone 15 ($799)
Better option exists: iPhone 15 Pro ($999)
Difference: $200

Upgrade Recommendation:

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ ๐Ÿ’ก Upgrade to iPhone 15 Pro?        โ”‚
โ”‚                                     โ”‚
โ”‚ Current choice: $799                โ”‚
โ”‚ iPhone 15 Pro: $999 (+$200)         โ”‚
โ”‚                                     โ”‚
โ”‚ With 12-month plan:                 โ”‚
โ”‚ โ€ข Standard: $67/month               โ”‚
โ”‚ โ€ข Pro: $83/month (+$16)             โ”‚
โ”‚                                     โ”‚
โ”‚ Pro benefits:                       โ”‚
โ”‚ โ€ข Better camera system              โ”‚
โ”‚ โ€ข Titanium design                   โ”‚
โ”‚ โ€ข More storage                      โ”‚
โ”‚                                     โ”‚
โ”‚ [UPGRADE FOR JUST $16/MONTH]        โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Upgrade Logic:

When to Suggest Upgrade:

Condition 1: Premium version exists
iPhone 15 โ†’ iPhone 15 Pro โœ…

Condition 2: Monthly difference is small
Standard: $67/month
Pro: $83/month
Difference: $16/month (< $20) โœ…

Condition 3: User can afford it
User's budget allows +$16/month โœ…

Condition 4: Meaningful upgrade
Pro version has substantial improvements โœ…

โ†’ SHOW UPGRADE SUGGESTION!

Don't Suggest If:
โ€ข Difference > $30/month (too much)
โ€ข Upgrade is minimal (not worth it)
โ€ข User explicitly chose budget option
โ€ข User on tightest plan already

D. Bundle Recommendations

Scenario:

User purchased: MacBook Pro ($1,299)
Payment plan: $108/month for 12 months

Bundle Suggestion:

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ ๐Ÿ“ฆ Complete Your Setup              โ”‚
โ”‚                                     โ”‚
โ”‚ Add to your payment plan:           โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ ๐Ÿ–ฑ๏ธ Magic Mouse                      โ”‚
โ”‚ $79 โ†’ +$7/month                     โ”‚
โ”‚ New total: $115/month               โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ โŒจ๏ธ Magic Keyboard                   โ”‚
โ”‚ $99 โ†’ +$8/month                     โ”‚
โ”‚ New total: $123/month               โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ ๐Ÿ’ผ Laptop Sleeve                    โ”‚
โ”‚ $49 โ†’ +$4/month                     โ”‚
โ”‚ New total: $112/month               โ”‚
โ”‚                                     โ”‚
โ”‚ ๐Ÿ’ก Still within your budget!        โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Bundle Logic:

Accessory Recommendation Rules:

Rule 1: Complementary Products
Main: MacBook
Suggest: Mouse, keyboard, bag, USB-C accessories
(Use collaborative filtering: "bought together")

Rule 2: Affordable Addition
Current: $108/month
User budget: Up to $200/month
Available: $92/month for accessories
โ†’ Suggest items totaling < $90/month

Rule 3: Prioritize by Value
High priority:
โ€ข Essential (mouse, charger)
โ€ข High satisfaction (4.5+ stars)
โ€ข Popular (many bought together)

Low priority:
โ€ข Optional accessories
โ€ข Lower rated items
โ€ข Expensive add-ons

Rule 4: Convenience Factor
"Add all 3 accessories for +$19/month
 Save time, get complete setup!"

E. Payment Flexibility Highlights

Key Feature: Users can pay ahead on installments

Recommendation Angle:

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ ๐Ÿ’ฐ Flexible Payments                โ”‚
โ”‚                                     โ”‚
โ”‚ Start at $108/month                 โ”‚
โ”‚ โ€ข Got a bonus? Pay extra anytime    โ”‚
โ”‚ โ€ข Finish early, save on duration    โ”‚
โ”‚ โ€ข No penalties for early payment    โ”‚
โ”‚ โ€ข Pause if needed (1 month grace)   โ”‚
โ”‚                                     โ”‚
โ”‚ "Pay at your own pace!"             โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

When to Highlight:

For New Users:
"Flexible payment plans available
 Pay early, pay later - you choose"

For Hesitant Users:
"Not sure? Start with small payments
 You can always pay more when ready"

For Seasonal Workers:
"Pay extra during busy season
 Reduce payments during slow months"

For Goal-Oriented Users:
"Finish your payments early
 Track progress in dashboard"

6. Combined Recommendations (Group Buy + Installments)

The Ultimate Deal Strategy

Scenario:

Product: MacBook Pro
Regular price: $1,299
Group buy discount: $200
Installment option: Available

Combined Offer:

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ ๐ŸŽฏ BEST DEAL COMBO!                 โ”‚
โ”‚                                     โ”‚
โ”‚ MacBook Pro                         โ”‚
โ”‚ Regular: $1,299                     โ”‚
โ”‚                                     โ”‚
โ”‚ ๐Ÿค Join Group: $1,099 (Save $200!)  โ”‚
โ”‚ ๐Ÿ’ณ + Pay Monthly: $92/month         โ”‚
โ”‚                                     โ”‚
โ”‚ ๐Ÿ‘ฅ 7/10 people | โฐ 6 hours left    โ”‚
โ”‚                                     โ”‚
โ”‚ โœจ Best combo deal:                 โ”‚
โ”‚ โ€ข Save $200 with group              โ”‚
โ”‚ โ€ข Just $92/month affordable         โ”‚
โ”‚ โ€ข No interest                       โ”‚
โ”‚                                     โ”‚
โ”‚ [JOIN GROUP + START PAYMENTS]       โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Combined Score Formula:

Ultimate Deal Score =
    (0.25 ร— Category Interest) +
    (0.25 ร— Affordability Boost) +
    (0.20 ร— Group Discount Size) +
    (0.15 ร— Social Proof) +
    (0.10 ร— Urgency) +
    (0.05 ร— Past Group Success)

Example - MacBook for User:
โ€ข Category: 95% (loves tech)
โ€ข Affordability: 95% ($92/month from $108!)
โ€ข Discount: 100% ($200 is huge savings)
โ€ข Social: 70% (7 people joined)
โ€ข Urgency: 90% (6 hours left!)
โ€ข Past: 80% (completed 2 groups)

Score = (0.25ร—95) + (0.25ร—95) + (0.20ร—100) + 
        (0.15ร—70) + (0.10ร—90) + (0.05ร—80)
     = 23.75 + 23.75 + 20 + 10.5 + 9 + 4
     = 91 โญโญโญ MAXIMUM APPEAL!

Why Combined Works:

Psychological Triggers:

1. Double Savings:
   "Save $200 + Pay less monthly = Win-Win"

2. Urgency:
   "Group closes in 6 hours! Act now!"

3. Social Proof:
   "7 others already in! Don't miss out!"

4. Affordability:
   "$92/month fits your budget easily"

5. Low Risk:
   "Flexible payments + Group guarantee"

Conversion Multiplier:
โ€ข Group buy alone: 8% conversion
โ€ข Installment alone: 15% conversion
โ€ข COMBINED: 25-30% conversion! ๐Ÿš€

Feed Placement:

Priority Positioning:

Top of Feed:
โ€ข Active group + installment combos
โ€ข Closing soon (<6 hours)
โ€ข User's interested categories

Mid-Feed:
โ€ข New group + installment deals
โ€ข Popular combos (high join rate)
โ€ข Category exploration

Bottom Feed:
โ€ข General group buy awareness
โ€ข Installment education
โ€ข Success stories

๐Ÿค– Embedding System Deep Dive

What Are Embeddings? (Conceptual Understanding)

Simple Analogy:

Think of embeddings as "coordinates" for meaning:

Words/Images โ†’ Machine Learning Model โ†’ Numbers (coordinates)

Similar meanings โ†’ Similar coordinates โ†’ Close together in space

Example in 2D (real embeddings are 384 or 512 dimensions!):

"Red Shoes"      โ†’ Point (0.8, 0.2)
"Crimson Sneakers" โ†’ Point (0.82, 0.18)  โ† Close!
"Blue Laptop"    โ†’ Point (0.1, 0.9)      โ† Far away!

Distance between "Red Shoes" and "Crimson Sneakers" is small
โ†’ They're similar!

Architecture of Embedding System

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚          SPRING BOOT BACKEND                  โ”‚
โ”‚                                               โ”‚
โ”‚  Need embedding for product/search query?    โ”‚
โ”‚              โ†“                                โ”‚
โ”‚  HTTP REST Call to Python Service            โ”‚
โ”‚  POST /embed-text or POST /embed-image       โ”‚
โ”‚  Body: {text: "..."} or {image_bytes: "..."} โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                 โ”‚
                 โ”‚ HTTP Request
                 โ†“
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚       PYTHON EMBEDDING SERVICE                โ”‚
โ”‚       (Flask Microservice)                    โ”‚
โ”‚                                               โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”‚
โ”‚  โ”‚  Models Loaded in Memory (at startup):โ”‚  โ”‚
โ”‚  โ”‚                                        โ”‚  โ”‚
โ”‚  โ”‚  โ€ข Sentence Transformers              โ”‚  โ”‚
โ”‚  โ”‚    (all-MiniLM-L6-v2)                 โ”‚  โ”‚
โ”‚  โ”‚    โ†’ Text embeddings (384 dims)       โ”‚  โ”‚
โ”‚  โ”‚                                        โ”‚  โ”‚
โ”‚  โ”‚  โ€ข CLIP                                โ”‚  โ”‚
โ”‚  โ”‚    (openai/clip-vit-base-patch32)     โ”‚  โ”‚
โ”‚  โ”‚    โ†’ Image embeddings (512 dims)      โ”‚  โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ”‚
โ”‚                                               โ”‚
โ”‚  Receives request โ†’ Processes โ†’ Returns arrayโ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                 โ”‚
                 โ”‚ Response: [0.23, 0.87, ...]
                 โ†“
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚          SPRING BOOT BACKEND                  โ”‚
โ”‚                                               โ”‚
โ”‚  Receives embedding array                     โ”‚
โ”‚              โ†“                                โ”‚
โ”‚  Stores in PostgreSQL (vector column)         โ”‚
โ”‚  OR: Uses for similarity search               โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

When Embeddings Are Generated

Trigger 1: Product Creation/Update

Flow:
Admin adds product
โ†’ Product saved to database (ID assigned)
โ†’ Background async job triggered
โ†’ Generate text embedding (name + description)
โ†’ Generate image embedding (from image URL)
โ†’ Update product record with both embeddings
โ†’ Product now searchable by similarity!

Timeline:
โ€ข Product creation: Immediate (< 100ms)
โ€ข Embedding generation: Background (2-5 seconds)
โ€ข Total user-facing delay: None (async)

Trigger 2: Search Query

Flow:
User types search query
โ†’ Frontend sends to backend
โ†’ Backend calls Python service
โ†’ Python generates query embedding
โ†’ Backend compares with product embeddings
โ†’ Returns similar products
โ†’ Frontend displays results

Timeline:
โ€ข User types โ†’ Results: ~500-800ms
โ€ข Embedding generation: ~50-100ms
โ€ข Search: ~100-200ms
โ€ข Network + rendering: ~300-500ms

Trigger 3: Image Upload (Visual Search)

Flow:
User uploads image
โ†’ Frontend sends image bytes
โ†’ Backend calls Python service
โ†’ Python generates image embedding
โ†’ Backend finds similar product images
โ†’ Returns visually similar products

Timeline:
โ€ข Image upload: ~500ms-1s (depends on size/network)
โ€ข Embedding generation: ~200-400ms
โ€ข Search: ~100-200ms
โ€ข Total: ~1-2 seconds (acceptable for visual search)

Trigger 4: Batch Processing

For existing products without embeddings:

Flow:
Admin triggers batch job
โ†’ Fetch products in batches (100 at a time)
โ†’ Generate embeddings for batch
โ†’ Save all embeddings
โ†’ Repeat for next batch

Timeline:
โ€ข 100 products: ~10-15 seconds
โ€ข 10,000 products: ~15-20 minutes
โ€ข Run once, or nightly for new products

Embedding Storage

Data Stored:

Product Table includes:
โ”œโ”€โ”€ id (primary key)
โ”œโ”€โ”€ name, description, price, etc.
โ”œโ”€โ”€ text_embedding (vector type, 384 dimensions)
โ””โ”€โ”€ image_embedding (vector type, 512 dimensions)

Each embedding stored as array of floats:
text_embedding: [0.234, 0.876, 0.123, ... 384 numbers]
image_embedding: [0.912, 0.234, 0.567, ... 512 numbers]

Storage per product:
โ€ข Text: 384 floats ร— 4 bytes = 1.5 KB
โ€ข Image: 512 floats ร— 4 bytes = 2.0 KB
โ€ข Total: 3.5 KB per product

For 10,000 products: 35 MB (tiny!)
For 1,000,000 products: 3.5 GB (manageable!)

Similarity Search Process

How Similarity Is Calculated:

Cosine Similarity Formula:

similarity = (A ยท B) / (||A|| ร— ||B||)

Where:
A ยท B = dot product (sum of multiplying each pair)
||A|| = magnitude of vector A
||B|| = magnitude of vector B

Result: Number between 0 (completely different) and 1 (identical)

Example:
Product A: [0.8, 0.6]
Product B: [0.9, 0.5]

A ยท B = (0.8 ร— 0.9) + (0.6 ร— 0.5) = 0.72 + 0.30 = 1.02
||A|| = โˆš(0.8ยฒ + 0.6ยฒ) = โˆš1.00 = 1.0
||B|| = โˆš(0.9ยฒ + 0.5ยฒ) = โˆš1.06 = 1.03

similarity = 1.02 / (1.0 ร— 1.03) = 0.99 (very similar!)

Search Performance:

Without Index:
โ€ข Compares with every product
โ€ข 10,000 products: ~2-5 seconds
โ€ข 100,000 products: ~20-50 seconds
โ€ข NOT acceptable!

With pgvector Index (ivfflat):
โ€ข Uses approximate nearest neighbor (ANN)
โ€ข Groups similar vectors together
โ€ข Only searches relevant groups
โ€ข 10,000 products: ~50-100ms
โ€ข 100,000 products: ~100-300ms
โ€ข 1,000,000 products: ~200-500ms
โ€ข Acceptable! โœ…

Trade-off:
โ€ข 100% accuracy vs 95-99% accuracy
โ€ข Worth it for speed improvement

Model Selection

Text Embeddings:

Model: all-MiniLM-L6-v2 (Sentence Transformers)

Pros:
โœ… Fast (50ms per embedding on CPU)
โœ… Small size (90 MB model)
โœ… Good quality for e-commerce
โœ… 384 dimensions (manageable)
โœ… Free and open source

Use for:
โ€ข Product descriptions
โ€ข Search queries
โ€ข Category matching
โ€ข Semantic similarity

Alternative if needed:
โ€ข all-mpnet-base-v2 (better quality, slower, 768 dims)
โ€ข multilingual models (if serving multiple countries)

Image Embeddings:

Model: CLIP (openai/clip-vit-base-patch32)

Pros:
โœ… Understands both images AND text
โœ… Can search images with text!
โœ… Good visual similarity
โœ… 512 dimensions
โœ… Free and open source

Use for:
โ€ข Product images
โ€ข Visual search
โ€ข Style matching
โ€ข "Shop the look"

Unique Feature:
Can compare text with images!
Query: "red shoes" (text embedding)
Find: Red shoe products (image embedding)
Cross-modal search! ๐Ÿ”ฅ

Optimization Strategies

Caching:

Strategy 1: Cache Similar Products
โ€ข After computing similarities, cache results
โ€ข TTL: 24 hours (products don't change often)
โ€ข Key: product_id
โ€ข Value: [similar_product_ids]

Strategy 2: Cache Query Embeddings
โ€ข Common searches generate same embeddings
โ€ข Cache: "red shoes" โ†’ [0.23, 0.87, ...]
โ€ข TTL: 1 hour
โ€ข Saves Python service calls

Strategy 3: Cache Popular Product Embeddings
โ€ข Keep hot product embeddings in Redis
โ€ข Faster than PostgreSQL lookup
โ€ข Updates when product changes

Batch Processing:

Instead of:
โ€ข Generate 1 embedding โ†’ 50ms
โ€ข Generate 100 embeddings โ†’ 5 seconds (sequential)

Better:
โ€ข Generate 100 embeddings in batch โ†’ 2 seconds!
โ€ข 2.5x faster by batching

Use for:
โ€ข Initial product catalog population
โ€ข Nightly updates
โ€ข Bulk imports

Index Tuning:

pgvector index parameters:

lists: Number of clusters
โ€ข More lists = faster search, less accuracy
โ€ข Fewer lists = slower search, more accuracy
โ€ข Sweet spot: โˆš(number of products)
โ€ข 10,000 products โ†’ lists = 100
โ€ข 100,000 products โ†’ lists = 316

Recommendation:
Start with lists = 100
Monitor search speed and accuracy
Adjust if needed

๐Ÿ“ฑ Feed Algorithm (Complete Flow)

Feed Generation Process (Step-by-Step)

User Opens App โ†’ Request Feed

REQUEST:
GET /api/feed?limit=20

HEADERS:
Authorization: Bearer {user_token}

Backend Process:

Step 1: User Profile Loading (< 50ms)

Check Redis cache:
โ€ข User's following list (Key: user:{id}:following)
โ€ข User's interested categories (Key: user:{id}:categories)
โ€ข User's seen posts (Key: user:{id}:seen_posts)

If not cached:
โ€ข Query PostgreSQL
โ€ข Store in Redis (TTL: 1 hour for following, 6 hours for categories)

Step 2: Candidate Gathering (< 200ms)

Parallel Queries (executed simultaneously):

Query A: Posts from Following
โ€ข Get posts from last 7 days
โ€ข From sellers user follows
โ€ข Not seen by user
โ€ข In stock
โ†’ Returns ~40-60 posts

Query B: Trending in Categories
โ€ข User's top 3 categories
โ€ข High engagement rate (>5%)
โ€ข Posted in last 48 hours
โ€ข Marked as trending (pre-computed)
โ†’ Returns ~20-30 posts

Query C: Local Sellers
โ€ข Same city/region as user
โ€ข Active posts (last 7 days)
โ€ข Not seen by user
โ†’ Returns ~10-20 posts

Query D: Sponsored
โ€ข Active campaigns
โ€ข Targeted to user's interests
โ€ข Budget remaining
โ†’ Returns ~5-10 posts

Total Candidates: ~80-120 posts

Step 3: Filtering (< 50ms)

Remove:
โ€ข Already seen by user (Redis lookup)
โ€ข Out of stock products
โ€ข Blocked/reported sellers
โ€ข Duplicate products
โ€ข Posts from blocked users

Remaining: ~60-100 posts

Step 4: Scoring Each Post (< 300ms)

For each post, calculate score (0-100):

A. Social Score (35 points max):
โ”œโ”€โ”€ Following seller? +15
โ”œโ”€โ”€ Friends engaged? +2 per friend (max 10)
โ”œโ”€โ”€ Seller verified? +5
โ””โ”€โ”€ Seller rating: +0 to 5 (based on stars)

B. Engagement Prediction (25 points):
โ”œโ”€โ”€ Category match? +10
โ”œโ”€โ”€ Price in range? +8
โ””โ”€โ”€ Similar to past likes? +7

C. Personalization (20 points):
โ”œโ”€โ”€ Matches search keywords? +2 each (max 8)
โ”œโ”€โ”€ Local seller? +7
โ””โ”€โ”€ Language/preferences? +5

D. Recency (15 points):
โ”œโ”€โ”€ <1 hour: +15
โ”œโ”€โ”€ <6 hours: +12
โ”œโ”€โ”€ <24 hours: +8
โ”œโ”€โ”€ <3 days: +4
โ””โ”€โ”€ Older: +1

E. Quality (5 points):
โ”œโ”€โ”€ 3+ images? +2
โ”œโ”€โ”€ Detailed description? +1
โ””โ”€โ”€ High engagement rate? +2

Example Post Score:
โ€ข Social: 15 (following) + 4 (2 friends) + 5 (verified) + 4.5 (rating) = 28.5
โ€ข Engagement: 10 (category) + 8 (price) + 7 (similar) = 25
โ€ข Personalization: 4 (keywords) + 7 (local) + 5 (prefs) = 16
โ€ข Recency: 12 (<6 hours) = 12
โ€ข Quality: 2 (images) + 1 (description) + 2 (engagement) = 5
โ†’ Total: 86.5 points โญโญโญ

Step 5: Ranking & Diversity (< 100ms)

Primary Sort: By score (highest first)

Apply Diversity Rules:
1. No more than 3 consecutive posts from same seller
   If post N and N+1 and N+2 are from Seller A:
   โ†’ Move post N+2 down

2. Category distribution in top 10
   Count categories in positions 1-10
   If one category > 5 posts:
   โ†’ Demote some, promote others

3. Price variance
   Check price distribution in top 20
   If all high-priced or all low-priced:
   โ†’ Inject middle-price items

4. Content type mix
   Aim for: 70% product posts, 20% reviews, 10% tutorials
   Adjust positions to achieve balance

5. Freshness boost (random)
   Randomly boost 2-3 recent high-quality posts (score >70)
   Prevents feed from being too predictable

6. Sponsored insertion
   Insert 1 sponsored post at position 7-10
   Feels native, not intrusive

Step 6: Pagination & Response (< 50ms)

Take top 20 posts from ranked list

Generate cursor:
โ€ข cursor = ID of 20th post
โ€ข Next request uses: ?cursor=12345&limit=20
โ€ข Returns posts with ID < 12345

Mark as seen:
โ€ข Add post IDs to Redis set: user:{id}:seen_posts
โ€ข TTL: 30 days
โ€ข Prevents showing same posts again

Track impressions:
โ€ข Log to analytics: user viewed these posts
โ€ข Used for engagement metrics
โ€ข Improve future recommendations

Response format:
{
  "posts": [...20 posts with full data...],
  "nextCursor": "12345",
  "hasMore": true
}

Total Time: ~750ms (well under 1 second target!)


Feed Diversity Strategies

Why Diversity Matters:

Without diversity:
โ€ข All posts from one seller (boring!)
โ€ข All same category (no discovery)
โ€ข All same price range (limits audience)
โ€ข Predictable feed (user stops scrolling)

With diversity:
โ€ข Varied content (keeps interest)
โ€ข Category discovery (impulse buys)
โ€ข Price options (something for everyone)
โ€ข Surprising finds (engagement boost)

Diversity Techniques:

1. Temporal Diversity (Avoid Staleness)

Problem: Old high-scored posts dominate

Solution: Recency Decay
โ€ข Posts >3 days old: Reduce score by 30%
โ€ข Posts >7 days old: Reduce score by 60%
โ€ข Forces fresh content into feed

Balance:
โ€ข Still show quality old posts if very high score
โ€ข But prioritize newer content generally

2. Seller Diversity (Avoid Spam Feeling)

3. Category Diversity (Enable Discovery)

Problem: User pigeonholed into one category

Solution: Category Quotas in Top 20
โ€ข Max 8 posts from dominant category
โ€ข Min 2 posts from each of user's top 3 categories
โ€ข 2-4 posts from exploratory categories

Example:
User loves "Electronics" (dominant)
Also views "Fashion", "Home"
Feed includes:
โ€ข 7 Electronics posts
โ€ข 4 Fashion posts
โ€ข 3 Home posts
โ€ข 3 Sports posts (exploratory)
โ€ข 3 Beauty posts (exploratory)

4. Price Diversity (Cater to Moods)

Problem: All luxury or all budget items

Solution: Price Distribution
Target in top 20 posts:
โ€ข 30% Budget (<$50)
โ€ข 40% Mid-range ($50-200)
โ€ข 30% Premium (>$200)

Rationale:
โ€ข Some users browsing casually (show budget)
โ€ข Some ready to buy (show mid-range)
โ€ข Some aspirational shopping (show premium)
โ€ข Different moods, different needs

5. Content Type Diversity

Mix post types:
โ€ข 70% Direct product posts (main content)
โ€ข 15% Review/testimonial posts (social proof)
โ€ข 10% Tutorial/how-to posts (educational value)
โ€ข 5% Behind-the-scenes (brand building)

Prevents feed from being pure sales pitch
Provides value beyond "buy now"

What Makes Content "Trending"?

Criteria:

1. High Engagement Rate
   Likes + Comments + Shares รท Views > 10%

2. Velocity (Speed of engagement)
   Gaining engagement faster than average
   
3. Recency
   Posted within last 48 hours

4. Social Spread
   Engaged by users from different networks
   (not just one seller's followers)
Background job runs every 5 minutes:
1. Query posts from last 48 hours
2. Calculate trending scores
3. Mark posts with score >50 as trending
4. Store in Redis (key: trending_posts)
5. Feed service reads from this cache

Why every 5 minutes?
โ€ข Balance between freshness and server load
โ€ข Trending changes relatively slowly (minutes, not seconds)
โ€ข Users won't notice 5-minute delay

๐Ÿ” Search System (Complete Flow)

Text Search Process

User Types Query โ†’ Get Results

REQUEST:
GET /api/search?q=comfortable+running+shoes&limit=20

BACKEND PROCESS:

Step 1: Query Processing (< 50ms)

Input: "comfortable running shoes"

Clean & Normalize:
โ€ข Lowercase: "comfortable running shoes"
โ€ข Remove special chars
โ€ข Trim whitespace
โ€ข Tokenize: ["comfortable", "running", "shoes"]

Check for special patterns:
โ€ข Price filter: "under $100"
โ€ข Color: "red", "blue"
โ€ข Brand: "nike", "adidas"
โ€ข Size: "size 10", "large"

Extract filters if present:
query = "comfortable running shoes"
filters = {price_max: 100, category: "footwear"}

Step 2: Generate Query Embedding (< 100ms)

Call Python Embedding Service:
POST http://embedding-service:5000/embed-text
Body: {"text": "comfortable running shoes"}

Python processes:
โ€ข Sentence Transformers model loaded in memory
โ€ข Generates embedding: [0.45, 0.78, 0.23, ... 384 numbers]

Returns: {"embedding": [0.45, 0.78, ...], "dimensions": 384}

Spring Boot receives embedding

Step 3: Similarity Search (< 200ms)

PostgreSQL query with pgvector:

Find products where:
โ€ข text_embedding similar to query embedding (cosine similarity)
โ€ข Apply filters (price, category, etc.)
โ€ข In stock
โ€ข Not blocked

Using pgvector index (ivfflat):
โ€ข Fast approximate nearest neighbor search
โ€ข Returns top 100 candidates with similarity scores

Step 4: Ranking & Boosting (< 100ms)

Base Ranking: Similarity score (0-1)

Apply Boosts:
โ€ข In stock: +10% score
โ€ข Popular (high sales): +8% score
โ€ข New arrival (<30 days): +5% score
โ€ข High rated (4.5+ stars): +5% score
โ€ข Exact keyword match in name: +15% score

Apply User Personalization:
โ€ข User's favorite brands: +10% score
โ€ข User's typical price range: +8% score
โ€ข User previously viewed: +5% score

Penalties:
โ€ข Low stock warning: -5% score
โ€ข Low rating (<3.5 stars): -10% score
โ€ข No image: -8% score

Final Sort: By boosted score

Step 5: Response (< 50ms)

Take top 20 results

Format response:
{
  "query": "comfortable running shoes",
  "results": [
    {
      "id": 123,
      "name": "Nike Air Max Running Shoes",
      "price": 89.99,
      "similarity": 0.92,
      "rating": 4.7,
      ...
    },
    ... 19 more
  ],
  "total": 847,
  "took": "287ms"
}

Total Time: ~500ms (good search experience!)


Visual Search Process

User Uploads Image โ†’ Get Visually Similar Products

REQUEST:
POST /api/search/visual
Content-Type: multipart/form-data
Body: image file

BACKEND PROCESS:

Step 1: Image Reception & Validation (< 100ms)

Receive uploaded file

Validate:
โ€ข File size < 10MB
โ€ข Format: JPG, PNG, WEBP
โ€ข Not corrupted

Preprocess:
โ€ข Resize if needed (max 1024x1024)
โ€ข Convert to RGB if grayscale
โ€ข Optimize file size

Temporarily store:
โ€ข Option A: Memory (for immediate processing)
โ€ข Option B: Temp file system (for large images)
โ€ข Option C: S3 (for tracking/analytics)

Step 2: Generate Image Embedding (< 300ms)

Convert image to bytes

Call Python Embedding Service:
POST http://embedding-service:5000/embed-image-bytes
Body: {"image_base64": "iVBORw0KGgo..."}

Python processes:
โ€ข CLIP model loaded in memory
โ€ข Processes image pixels (not filename!)
โ€ข Understands visual features: color, shape, style
โ€ข Generates embedding: [0.91, 0.23, 0.67, ... 512 numbers]

Returns: {"embedding": [0.91, 0.23, ...], "dimensions": 512}

Spring Boot receives embedding

Step 3: Visual Similarity Search (< 200ms)

PostgreSQL query with pgvector:

Find products where:
โ€ข image_embedding similar to upload embedding
โ€ข In stock
โ€ข Has image (obviously)

Using pgvector index on image_embedding:
โ€ข Fast ANN search
โ€ข Returns top 50 candidates with visual similarity scores

Step 4: Ranking & Filtering (< 100ms)

Base Ranking: Visual similarity (0-1)

Filter Options (if user specifies):
โ€ข Category filter: "Show only shoes"
โ€ข Price range: "$50-150"
โ€ข Brand preference

Apply Boosts:
โ€ข Popular in category: +10%
โ€ข High quality images: +5%
โ€ข Multiple product images: +5%

Sort by final score

Step 5: Response (< 50ms)

Return top 20 visually similar products

Response format:
{
  "results": [
    {
      "id": 456,
      "name": "Similar Red Sneakers",
      "image_url": "...",
      "similarity": 0.88,
      "price": 79.99
    },
    ... 19 more
  ],
  "message": "Found 20 visually similar products"
}

Total Time: ~750ms-1s (acceptable for visual search!)


Search Enhancements

Query Understanding (NLP Techniques):

Intent Detection:

"gift for dad" โ†’ Gift intent
โ†’ Boost: Gift-appropriate products
โ†’ Filter: Hide inappropriate items

"cheap laptop" โ†’ Budget intent
โ†’ Sort: Price low to high
โ†’ Filter: <$500

"best running shoes" โ†’ Quality intent
โ†’ Sort: Rating high to low
โ†’ Boost: Reviews, testimonials

"red nike shoes size 10" โ†’ Specific intent
โ†’ Exact filters applied
โ†’ Precise matching

Spell Correction:

User types: "iphone 15 pro mac"
System detects: "mac" likely typo
Suggestion: "Did you mean: iphone 15 pro max?"

Auto-correct for common typos:
โ€ข "iphone" โ†’ "iPhone"
โ€ข "macbook" โ†’ "MacBook"
โ€ข "airpods" โ†’ "AirPods"

Search Suggestions (As User Types):

User types: "run"
Suggestions appear:
โ€ข "running shoes" (popular)
โ€ข "running shorts" (trending)
โ€ข "running watch" (category)

Based on:
โ€ข Popular searches (last 7 days)
โ€ข User's past searches
โ€ข Trending products
โ€ข Category completions
After search results, suggest:
"People also searched for:"
โ€ข "nike running shoes" (brand specific)
โ€ข "trail running shoes" (category variant)
โ€ข "running shoes for women" (gender variant)

Generated from:
โ€ข Search session data (what others searched next)
โ€ข Similar query embeddings
โ€ข Category relationships

๐Ÿ› ๏ธ Technology Stack

Backend (Spring Boot)

Core Framework:

Libraries & Dependencies:

Database

Primary Database:

Caching:

Embedding Service (Python)

Framework:

ML Libraries:

Models:

Infrastructure

Containerization:

Orchestration (Production):

Cloud Services (Optional):

Monitoring & Observability

Metrics:

Logging:

Tracing:


๐Ÿ“… Implementation Phases

Phase 1: MVP - Basic Recommendations (Weeks 1-3)

Goal: Launch with working feed and basic recommendations

Features to Implement:

โœ… Social Feed (Core)

โœ… Basic Product Similarity

โœ… Engagement Tracking

โœ… Group Buy (Core)

โœ… Installment Display

No Embeddings in Phase 1!

Success Metrics:


Phase 2: Enhanced Recommendations (Weeks 4-6)

Goal: Add semantic search and better personalization

Features to Implement:

โœ… Text Embeddings

โœ… Improved Product Similarity

โœ… User Interest Profiling

โœ… Enhanced Feed Algorithm

โœ… Group Buy Intelligence

โœ… Installment Recommendations

Success Metrics:


Phase 3: Visual Search & Advanced Features (Weeks 7-9)

Goal: Add image-based features and optimize

Features to Implement:

โœ… Image Embeddings

โœ… Shop the Look

โœ… Feed Optimization

โœ… Caching Strategy

โœ… A/B Testing Framework

Success Metrics:


Phase 4: Optimization & ML (Weeks 10+)

Goal: Fine-tune and scale

Features to Implement:

โœ… Performance Optimization

โœ… Advanced Personalization

โœ… Recommendation Quality

โœ… Scaling

โœ… Analytics Dashboard

Success Metrics:


โšก Performance & Scaling

Performance Targets

API Response Times (95th percentile):

Feed API: < 500ms
Search API: < 600ms
Visual Search: < 1.5s
Product Detail: < 200ms
Group Buy List: < 300ms

Database Query Times:

Simple queries: < 50ms
Vector similarity (with index): < 200ms
Complex joins: < 300ms

Caching Hit Rates:

User following list: > 90%
Trending posts: > 95%
Product embeddings: > 80%
Search query results: > 60%

Optimization Strategies

1. Database Optimization

Indexing Strategy:
โ€ข Primary keys (automatic)
โ€ข Foreign keys (user_id, product_id, etc.)
โ€ข Frequently filtered columns (category, price, created_at)
โ€ข Vector columns (pgvector ivfflat index)
โ€ข Composite indexes for common queries

Query Optimization:
โ€ข Use EXPLAIN ANALYZE to identify slow queries
โ€ข Avoid N+1 queries (use JOIN or batch fetch)
โ€ข Limit result sets appropriately
โ€ข Use covering indexes where possible

Connection Pooling:
โ€ข HikariCP (default in Spring Boot)
โ€ข Pool size: 10-20 connections per instance
โ€ข Connection timeout: 30 seconds
โ€ข Idle timeout: 10 minutes

2. Caching Strategy

Redis Cache Layers:

Layer 1: Hot Data (TTL: 5-15 minutes)
โ€ข Trending posts
โ€ข Active group buys
โ€ข Real-time counters

Layer 2: Warm Data (TTL: 1-6 hours)
โ€ข User following lists
โ€ข User interest profiles
โ€ข Popular products

Layer 3: Cold Data (TTL: 24 hours)
โ€ข Similar product lists (pre-computed)
โ€ข Static content
โ€ข Configuration data

Layer 4: Session Data (TTL: 7-30 days)
โ€ข User seen posts
โ€ข Shopping cart
โ€ข Search history

Cache Invalidation:
โ€ข Update cache when data changes
โ€ข Lazy invalidation (TTL expiry)
โ€ข Active invalidation (delete on update)

3. Embedding Service Optimization

Model Loading:
โ€ข Load models at startup (not per request)
โ€ข Keep in memory (RAM)
โ€ข Use GPU if available (5-10x faster)

Batch Processing:
โ€ข Process multiple embeddings together
โ€ข Amortize model overhead
โ€ข 2-3x throughput improvement

Request Queuing:
โ€ข Queue embedding requests
โ€ข Batch process every 100ms
โ€ข Balance latency vs throughput

Caching:
โ€ข Cache common query embeddings
โ€ข Cache product embeddings in Redis
โ€ข Reduce Python service calls

4. Feed Generation Optimization

Pre-computation:
โ€ข Background job: Calculate trending posts (every 5 min)
โ€ข Background job: Update user interests (daily)
โ€ข Background job: Pre-compute popular similarities

Parallel Processing:
โ€ข Fetch candidates in parallel (following + trending + local)
โ€ข Score posts in parallel (if list is large)
โ€ข Use CompletableFuture or reactive programming

Result Caching:
โ€ข Cache full feed for 1-2 minutes (same user, same request)
โ€ข Cache candidate lists for 5 minutes
โ€ข Invalidate on user action (post, like, follow)

5. Search Optimization

Index Strategy:
โ€ข pgvector ivfflat index on embeddings
โ€ข Full-text search index (if needed)
โ€ข Composite indexes on filters

Query Rewriting:
โ€ข Combine filters in single query
โ€ข Use covering indexes
โ€ข Avoid sequential scans

Result Caching:
โ€ข Cache popular search queries
โ€ข TTL: 10-30 minutes
โ€ข Personalized results: 2-5 minutes

Scaling Strategy

Horizontal Scaling (Add More Servers):

Application Layer (Spring Boot):
โ€ข Stateless design (session in Redis, not memory)
โ€ข Load balancer distributes requests
โ€ข Scale out to 5, 10, 20+ instances
โ€ข Auto-scaling based on CPU/memory

Embedding Service (Python):
โ€ข Run multiple instances
โ€ข Load balancer in front
โ€ข Each instance has model loaded
โ€ข Scale based on request rate

Database (PostgreSQL):
โ€ข Read replicas for heavy read load
โ€ข Write to master, read from replicas
โ€ข Connection pooling per instance
โ€ข Separate analytics queries to replica

Vertical Scaling (Bigger Servers):

When to Scale Up:
โ€ข Database: More RAM for cache, faster CPU for queries
โ€ข Redis: More RAM for bigger cache
โ€ข Embedding service: GPU for faster inference

Sweet Spot:
โ€ข Application: 4-8 CPU, 8-16 GB RAM
โ€ข Database: 8-16 CPU, 32-64 GB RAM
โ€ข Redis: 4 CPU, 16-32 GB RAM
โ€ข Embedding: 4-8 CPU, 16-32 GB RAM, GPU optional

Caching Layers:

Level 1: Application Cache (Spring @Cacheable)
โ€ข Short-lived data (seconds to minutes)
โ€ข User session data
โ€ข Request-scoped cache

Level 2: Redis Cache
โ€ข Medium-lived data (minutes to hours)
โ€ข Cross-instance sharing
โ€ข High performance

Level 3: CDN (Images, Static Assets)
โ€ข Long-lived data (hours to days)
โ€ข Globally distributed
โ€ข Reduce origin load

Level 4: Database Query Cache
โ€ข PostgreSQL query result cache
โ€ข Automatic by database
โ€ข Benefit from repeated queries

Database Partitioning (Future):

When Needed:
โ€ข >10 million products
โ€ข >100 million posts
โ€ข Query performance degrades

Partitioning Strategy:
โ€ข Range partition by date (posts)
โ€ข Hash partition by user_id (user data)
โ€ข List partition by category (products)

Sharding (Advanced):
โ€ข Separate database per region
โ€ข Shard by user_id hash
โ€ข Cross-shard queries expensive (avoid)

๐Ÿ“ˆ Metrics & Monitoring

Key Performance Indicators (KPIs)

User Engagement:

Feed Metrics:
โ€ข Feed Engagement Rate: (Likes + Comments + Shares) / Views
  Target: >5%
โ€ข Scroll Depth: Average posts viewed per session
  Target: >30 posts
โ€ข Session Duration: Time spent in app
  Target: >10 minutes
โ€ข Return Rate: % users returning within 7 days
  Target: >40%

Conversion Metrics:

Purchase Funnel:
โ€ข Feed View โ†’ Product Page: 8-12%
โ€ข Product Page โ†’ Add to Cart: 15-20%
โ€ข Cart โ†’ Purchase: 30-40%
โ€ข Overall Conversion: 3-5%

Group Buy:
โ€ข Group View โ†’ Join: 25-35%
โ€ข Group Completion Rate: 70-80%
โ€ข Average Group Fill Time: < 48 hours

Installment:
โ€ข Installment Shown โ†’ Selected: 40-50%
โ€ข Premium Product with Installment: 20-25% conversion boost

Recommendation Quality:

Click-Through Rates:
โ€ข "Similar Products": 12-18%
โ€ข Search Results: 8-12%
โ€ข Feed Recommendations: 5-8%
โ€ข Group Buy Suggestions: 20-30%

Relevance Metrics:
โ€ข Precision@10: >60% (of top 10 results are relevant)
โ€ข Mean Reciprocal Rank: >0.7 (relevant result in top 3)
โ€ข User Satisfaction: Survey score >4/5

System Performance:

Response Times (p95):
โ€ข Feed Load: < 500ms
โ€ข Search: < 600ms
โ€ข Product Page: < 200ms
โ€ข Visual Search: < 1.5s

Availability:
โ€ข Uptime: >99.9% (less than 45 min downtime/month)
โ€ข Error Rate: <0.1%
โ€ข Successful API calls: >99.5%

Resource Utilization:
โ€ข CPU: 60-80% (headroom for spikes)
โ€ข Memory: 70-85%
โ€ข Database connections: 50-70% of pool

Monitoring Setup

Application Monitoring:

Spring Boot Actuator:
โ€ข /actuator/health: System health
โ€ข /actuator/metrics: Performance metrics
โ€ข /actuator/prometheus: Prometheus export

Custom Metrics:
โ€ข Feed generation time
โ€ข Embedding generation time
โ€ข Similarity search time
โ€ข Cache hit rates
โ€ข Recommendation click-through rates

Database Monitoring:

PostgreSQL Metrics:
โ€ข Query execution time
โ€ข Index usage
โ€ข Table sizes
โ€ข Connection pool status
โ€ข Slow query log

pgvector Specific:
โ€ข Vector search time
โ€ข Index efficiency
โ€ข Embedding storage size

Cache Monitoring:

Redis Metrics:
โ€ข Hit rate
โ€ข Memory usage
โ€ข Eviction rate
โ€ข Command execution time
โ€ข Key expiration rate

Business Metrics Dashboard:

Real-Time:
โ€ข Active users now
โ€ข Feeds served (per minute)
โ€ข Searches executed
โ€ข Group buys active
โ€ข Purchases completed

Daily:
โ€ข New users
โ€ข Daily active users (DAU)
โ€ข Group buy completion rate
โ€ข Average order value
โ€ข Conversion rate

Weekly/Monthly:
โ€ข User retention (cohort analysis)
โ€ข Revenue trends
โ€ข Top categories
โ€ข Best performing recommendations
โ€ข A/B test results

๐ŸŽฏ Success Criteria

Phase 1 (MVP) Success:

Phase 2 (Enhanced) Success:

Phase 3 (Advanced) Success:

Long-Term Success:


๐Ÿ“ Final Notes

Critical Principles

1. Start Simple, Scale Smart:

2. User Data is Gold:

3. Social Proof Wins:

4. Affordability = Access:

5. Feed > Search:

6. Diversity Matters:

7. Performance is Feature:

When to Use What

Embeddings:

Scoring System:

Social Graph:

Collaborative Filtering:

All Together:


๐ŸŽ“ Conclusion

This architecture document provides a complete blueprint for Nexgate's recommendation system. It combines:

The system is designed to:

  1. Start simple (MVP with basic features)
  2. Scale smart (add complexity as needed)
  3. Learn continuously (from user behavior)
  4. Prioritize performance (fast = engaging)
  5. Drive conversions (discovery โ†’ purchase)

Remember: The best recommendation system is one that:

Good luck building Nexgate! ๐Ÿš€


Document Version: 1.0
Last Updated: November 2025
Next Review: After Phase 1 completion


Understanding Recommendation Systems - From Zero to Hero ๐Ÿ“š


๐ŸŽฏ What You'll Learn

This guide explains recommendation systems from first principles, with real-world examples, formulas, and the math behind them. No code, just concepts!


๐Ÿ“– Chapter 1: What Are Recommendation Systems?

The Simple Definition

A recommendation system is a tool that predicts what you might like based on:

Real-World Analogy

Imagine a smart bookstore clerk:

That's essentially what a recommendation system does!


๐Ÿ—๏ธ Chapter 2: The Three Main Types

Type 1: Content-Based Filtering

Concept: Recommend items similar to what you liked before.

How it works:

  1. Analyze features of items you liked
  2. Find other items with similar features
  3. Recommend those items

Example:

You liked:
- "Harry Potter" (Fantasy, Magic, Young Adult, Adventure)
- "Lord of the Rings" (Fantasy, Magic, Epic, Adventure)

System recommends:
- "The Hobbit" (Fantasy, Magic, Adventure) โœ… Very similar!
- "Chronicles of Narnia" (Fantasy, Magic, Young Adult) โœ… Good match!

The Math Behind It:

Each item is represented as a feature vector:

Harry Potter = [Fantasy: 1, Magic: 1, Young Adult: 1, Adventure: 1, Romance: 0]
Lord of the Rings = [Fantasy: 1, Magic: 1, Young Adult: 0, Adventure: 1, Romance: 0]
The Hobbit = [Fantasy: 1, Magic: 1, Young Adult: 0, Adventure: 1, Romance: 0]

Similarity Calculation (Cosine Similarity):

Similarity = (A ยท B) / (||A|| ร— ||B||)

Where:
A ยท B = Dot product (multiply matching features)
||A|| = Magnitude of vector A
||B|| = Magnitude of vector B

Result: Number between 0 (totally different) and 1 (identical)

Pros:

Cons:


Type 2: Collaborative Filtering

Concept: "People like you also liked..."

How it works:

  1. Find users similar to you
  2. See what they liked
  3. Recommend those items to you

Example:

You (Alice):
- Liked: iPhone, MacBook, AirPods
- Rating: 5 stars, 5 stars, 4 stars

Similar User (Bob):
- Liked: iPhone, MacBook, AirPods, Apple Watch
- Rating: 5 stars, 5 stars, 5 stars, 5 stars

Recommendation for Alice:
โ†’ Apple Watch (because Bob, who has similar taste, loves it!)

Two Approaches:

A. User-Based Collaborative Filtering

Formula for User Similarity (Pearson Correlation):

similarity(user_a, user_b) = 
  ฮฃ(rating_a - avg_a)(rating_b - avg_b) 
  / โˆš[ฮฃ(rating_a - avg_a)ยฒ] ร— โˆš[ฮฃ(rating_b - avg_b)ยฒ]

Result: Number between -1 (opposite taste) and 1 (identical taste)

Example Calculation:

Alice's ratings: [5, 4, 3, ?, 2]
Bob's ratings:   [5, 5, 3, 4, 2]
Carol's ratings: [1, 2, 3, 4, 5]

Similarity(Alice, Bob) = 0.95 (very similar!)
Similarity(Alice, Carol) = -0.8 (opposite taste!)

Predict Alice's rating for item 4:
โ†’ Use Bob's rating (4) because Bob is most similar

B. Item-Based Collaborative Filtering

Instead of finding similar users, find similar items!

Example:

People who bought iPhone also bought:
- iPhone Case (90% of buyers)
- Screen Protector (85% of buyers)
- AirPods (60% of buyers)
- Apple Watch (40% of buyers)

You bought iPhone โ†’ Recommend iPhone Case (highest correlation!)

Formula for Item Similarity:

similarity(item_i, item_j) = 
  Number of users who liked both items
  / โˆš(Users who liked item_i ร— Users who liked item_j)

This is called "Jaccard Similarity"

Pros:

Cons:


Type 3: Hybrid Systems

Concept: Combine multiple approaches for better results!

Common Combinations:

A. Weighted Hybrid

Final Score = 
  (0.5 ร— Content-Based Score) + 
  (0.5 ร— Collaborative Score)

Example:
Product X:
- Content similarity to your likes: 0.8
- People like you also bought it: 0.6
- Final score: (0.5 ร— 0.8) + (0.5 ร— 0.6) = 0.7

B. Switching Hybrid

IF user is new (no history):
    โ†’ Use Content-Based (based on item features)
ELSE IF user has lots of history:
    โ†’ Use Collaborative (based on similar users)

C. Cascade Hybrid

Step 1: Content-Based filters 1000 โ†’ 100 items
Step 2: Collaborative ranks those 100 โ†’ Top 10
Step 3: Show top 10 to user

๐Ÿ“ Chapter 3: The Math Explained Simply

Similarity Measures

These are ways to measure "how alike" two things are.

1. Cosine Similarity (Most Common)

Imagine two arrows in space:

Arrow A points โ†’ (3, 4)
Arrow B points โ†’ (4, 3)

Angle between them = small โ†’ Similar!
Angle = 90ยฐ โ†’ Completely different

Formula:

cosine_similarity = cos(ฮธ) = (A ยท B) / (|A| ร— |B|)

Where:
A ยท B = (3ร—4) + (4ร—3) = 12 + 12 = 24
|A| = โˆš(3ยฒ + 4ยฒ) = โˆš25 = 5
|B| = โˆš(4ยฒ + 3ยฒ) = โˆš25 = 5

Result = 24 / (5 ร— 5) = 24/25 = 0.96 (very similar!)

Range: 0 (perpendicular) to 1 (identical direction)


2. Euclidean Distance

Think of it as "crow flies" distance:

Point A = (1, 2)
Point B = (4, 6)

Distance = โˆš[(4-1)ยฒ + (6-2)ยฒ]
         = โˆš[9 + 16]
         = โˆš25
         = 5

Closer distance = More similar

Problem: Doesn't work well with different scales!

Price: $10 vs $15 (difference = 5)
Rating: 3 vs 4 stars (difference = 1)

The price difference dominates unfairly!

Solution: Normalize first (scale everything 0-1)


3. Pearson Correlation

Measures if two things move together:

Alice rates: [5, 4, 3, 2, 1]
Bob rates:   [5, 4, 3, 2, 1]
โ†’ Perfect correlation = 1.0 (they always agree!)

Alice rates: [5, 4, 3, 2, 1]
Carol rates: [1, 2, 3, 4, 5]
โ†’ Perfect negative correlation = -1.0 (opposite taste!)

Formula:

r = ฮฃ[(x - xฬ„)(y - ศณ)] / โˆš[ฮฃ(x - xฬ„)ยฒ ร— ฮฃ(y - ศณ)ยฒ]

Where:
xฬ„ = average of x
ศณ = average of y

Range: -1 (opposite) to +1 (identical)


Matrix Factorization (Advanced!)

The Idea: Break down the user-item matrix into hidden patterns.

Real-World Example:

Movie ratings matrix:
           Action  Comedy  Drama
Alice        5       2       4
Bob          5       1       3
Carol        1       5       2

Hidden factors might be:
Factor 1: "Likes serious content"
Factor 2: "Likes funny content"

Alice = [High Factor 1, Low Factor 2] โ†’ Likes Action/Drama
Carol = [Low Factor 1, High Factor 2] โ†’ Likes Comedy

This is what Netflix does!

They discovered hidden factors like:

Formula (Simplified):

Rating = User_Vector ยท Item_Vector

Alice's vector = [0.9, 0.2] (serious, not funny)
Action movie vector = [0.8, 0.1] (serious, not funny)

Predicted rating = (0.9 ร— 0.8) + (0.2 ร— 0.1) 
                 = 0.72 + 0.02 
                 = 0.74 (normalized) 
                 โ‰ˆ 4.5 stars

๐ŸŽ“ Chapter 4: Real-World Examples Explained

Example 1: Netflix

What they use: Hybrid system with heavy collaborative filtering + content-based

How it works:

Step 1: Collaborative Filtering
- Find users who rated movies similarly to you
- Weight: 60%

Step 2: Content-Based
- Analyze genres, actors, directors you like
- Weight: 25%

Step 3: Trending/Popular
- What's hot right now
- Weight: 15%

Final Score = (0.6 ร— Collaborative) + (0.25 ร— Content) + (0.15 ร— Trending)

Why it works:


Example 2: Amazon

What they use: Primarily item-based collaborative filtering

The Famous Algorithm: "Customers who bought X also bought Y"

How it's calculated:

iPhone โ†’ Case: 85% co-purchase rate
iPhone โ†’ Screen Protector: 78% co-purchase rate
iPhone โ†’ Charger: 65% co-purchase rate
iPhone โ†’ Laptop: 5% co-purchase rate

Formula:
Co-purchase rate = 
  (Times X and Y bought together) / (Times X was bought)

Example:
iPhone bought: 1000 times
iPhone + Case bought together: 850 times
Co-purchase rate = 850/1000 = 85%

Why it works:


Example 3: Spotify

What they use: Hybrid with collaborative + audio analysis + social

Three Recommendation Types:

A. Collaborative Filtering

Your playlists: [Pop, Rock, Indie]
Similar user's playlists: [Pop, Rock, Indie, Alternative]
โ†’ Recommend Alternative music

B. Audio Analysis (Content-Based)

Song features analyzed:
- Tempo: 120 BPM
- Key: C Major
- Energy: High
- Valence (happiness): Medium
- Acousticness: Low

Find songs with similar audio features!

C. Social

Your friends listen to:
- Artist X: 80% of friends
- Artist Y: 60% of friends
โ†’ Recommend Artist X

Weekly Discover Playlist:

= 30% Collaborative (users like you)
+ 30% Audio similarity (songs like yours)
+ 20% New releases in your genres
+ 20% Social (what friends listen to)

Example 4: TikTok (The King!)

What they use: Engagement prediction model (ML-based)

How it works:

For each video, predict:
- Will user watch to the end? (Completion rate)
- Will user like it?
- Will user comment?
- Will user share?
- Will user follow creator?

Score = 
  (10 ร— Completion prediction) +
  (5 ร— Like prediction) +
  (8 ร— Comment prediction) +
  (12 ร— Share prediction) +
  (15 ร— Follow prediction)

Show videos with highest predicted score!

Features considered:

Video features:
- Category/hashtags
- Music used
- Duration
- Captions

User features:
- Past liked categories
- Watch time patterns
- Engagement history
- Language preference

Interaction features:
- Time of day
- Device type
- Network speed

Why it's so addictive:


๐Ÿ“Š Chapter 5: Common Formulas Reference

1. Weighted Score (Most Common in Practice!)

Final Score = ฮฃ(Weight_i ร— Score_i)

Example (E-commerce):
Product Score = 
  (0.35 ร— Social_Score) +
  (0.25 ร— Engagement_Score) +
  (0.20 ร— Personalization_Score) +
  (0.15 ร— Recency_Score) +
  (0.05 ร— Quality_Score)

Each component score is 0-100, normalized

2. Recency Decay

Recency Score = Base_Score ร— e^(-ฮป ร— time)

Where:
ฮป (lambda) = decay rate (how fast score decreases)
time = hours/days since creation
e = 2.71828 (natural logarithm base)

Example:
Base score = 100
ฮป = 0.1 (slow decay)
After 24 hours: 100 ร— e^(-0.1 ร— 24) = 100 ร— 0.091 = 9.1

Interpretation: Old content gets much lower score

Simpler Alternative (Step Function):

IF age < 1 hour: Score = 100
ELSE IF age < 6 hours: Score = 80
ELSE IF age < 24 hours: Score = 50
ELSE IF age < 7 days: Score = 20
ELSE: Score = 5

3. Engagement Rate

Engagement Rate = 
  (Likes + Comments + Shares) / Views

Example:
Video: 10,000 views, 500 likes, 50 comments, 30 shares
Engagement = (500 + 50 + 30) / 10,000 = 0.058 = 5.8%

Good engagement: > 5%
Viral content: > 15%

4. Click-Through Rate (CTR)

CTR = Clicks / Impressions

Example:
Product shown 1000 times
Clicked 50 times
CTR = 50/1000 = 0.05 = 5%

Use CTR to rank items:
Higher CTR = Better recommendation

5. Conversion Rate

Conversion Rate = Purchases / Clicks

Example:
Product clicked 100 times
Purchased 10 times
Conversion = 10/100 = 10%

Ultimate metric: Did recommendation lead to action?

๐ŸŽฏ Chapter 6: Choosing the Right System

Decision Framework

Use Content-Based When:

Examples: News articles, blog posts, jobs


Use Collaborative Filtering When:

Examples: Movies, music, products


Use Hybrid When:

Examples: E-commerce (like Amazon), streaming (like Netflix)


Use Social/Graph-Based When:

Examples: Social commerce, TikTok, Instagram Shopping


๐Ÿ“š Chapter 7: Learning Resources

Books (No Code!)

1. "Recommendation Systems: The Textbook" by Charu Aggarwal

2. "Practical Recommender Systems" by Kim Falk

3. "Programming Collective Intelligence" by Toby Segaran

Online Courses

1. Coursera: "Recommender Systems" by University of Minnesota

2. YouTube: "StatQuest with Josh Starmer"

3. Google's Machine Learning Crash Course

Papers (Foundational)

1. "Amazon.com Recommendations: Item-to-Item Collaborative Filtering"

2. "The Netflix Prize" papers

3. "BPR: Bayesian Personalized Ranking"

Websites

1. Towards Data Science (Medium)

2. Papers With Code

3. Google Research Blog


๐Ÿงฎ Chapter 8: Working Example (No Code!)

Scenario: Recommend Products for Alice

Alice's History:

Bought: iPhone ($999), AirPods ($199), MacBook ($1299)
Viewed: iPad, Apple Watch, iPhone Case
Searched: "wireless earbuds", "laptop accessories"
Budget range: $150-1500

Available Products:

1. Apple Watch ($399)
2. iPad ($329)
3. Samsung Phone ($899)
4. Laptop Stand ($49)
5. Wireless Keyboard ($129)
6. iPhone Case ($29)
7. AirPods Pro ($249)

Method 1: Content-Based Scoring

Step 1: Define Item Features

Apple Watch:
- Brand: Apple (1)
- Category: Electronics (1)
- Price Range: Mid ($399 in her range โœ…)
- Compatibility: iPhone (1)

Samsung Phone:
- Brand: Samsung (0 - she buys Apple)
- Category: Electronics (1)
- Price Range: High ($899 โœ…)
- Compatibility: Android (0)

Step 2: Calculate Similarity

Apple Watch vs Alice's preferences:
Brand match: 100% (all Apple)
Category match: 100% (all electronics)
Price match: 80% (slightly lower than average)
Compatibility: 100% (has iPhone)

Similarity Score = (100 + 100 + 80 + 100) / 4 = 95%

Samsung Phone:
Brand match: 0%
Category match: 100%
Price match: 90%
Compatibility: 0%

Similarity Score = (0 + 100 + 90 + 0) / 4 = 47.5%

Ranking:

  1. Apple Watch (95%)
  2. AirPods Pro (92%)
  3. iPad (88%)
  4. Samsung Phone (47.5%)

Method 2: Collaborative Filtering

Step 1: Find Similar Users

Alice bought: [iPhone, AirPods, MacBook]

Bob bought: [iPhone, AirPods, MacBook, Apple Watch]
Similarity: 3/3 common items = 100% overlap!

Carol bought: [iPhone, Samsung Phone, Android Tablet]
Similarity: 1/3 common items = 33% overlap

Dan bought: [Dell Laptop, Android Phone]
Similarity: 0/3 common items = 0% overlap

Step 2: Recommend What Similar Users Bought

Bob (100% similar) also bought:
โ†’ Apple Watch โœ… Strong recommendation!

Carol (33% similar) also bought:
โ†’ Samsung Phone โŒ Weak recommendation

Dan (0% similar):
โ†’ Ignore his purchases

Ranking:

  1. Apple Watch (Bob recommends, 100% similarity)
  2. iPad (viewed but not bought - weaker signal)

Method 3: Hybrid Approach (Best!)

Combine Both Methods:

Apple Watch:
- Content similarity: 95%
- Collaborative: 100% (Bob bought it)
- Final: (0.5 ร— 95) + (0.5 ร— 100) = 97.5 โญ

iPad:
- Content similarity: 88%
- Collaborative: 50% (Alice viewed, no strong signal)
- Final: (0.5 ร— 88) + (0.5 ร— 50) = 69

Samsung Phone:
- Content similarity: 47.5%
- Collaborative: 33% (Carol bought, low similarity)
- Final: (0.5 ร— 47.5) + (0.5 ร— 33) = 40.25

Final Ranking:

  1. Apple Watch (97.5) โ† Recommend this!
  2. AirPods Pro (92)
  3. iPad (69)
  4. Wireless Keyboard (55)
  5. Samsung Phone (40.25)

Adding More Factors

Recency Boost:

Apple Watch: Released 2 months ago โ†’ +5 points
iPad: Released 6 months ago โ†’ +3 points
Samsung Phone: Released 2 years ago โ†’ +0 points

Updated scores:
1. Apple Watch (102.5)
2. AirPods Pro (92)
3. iPad (72)

Social Proof:

Apple Watch: 4.8 stars, 10,000 reviews โ†’ +8 points
iPad: 4.7 stars, 8,000 reviews โ†’ +7 points
Samsung Phone: 4.5 stars, 5,000 reviews โ†’ +5 points

Final scores:
1. Apple Watch (110.5) โญโญโญ
2. AirPods Pro (92)
3. iPad (79)

๐Ÿ’ก Key Takeaways

The Golden Rules

1. Simple Often Wins

2. Context Matters

3. Multiple Signals Are Better

4. Measure What Matters

5. Cold Start Is Hard


๐ŸŽฏ Summary Cheat Sheet

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚         Recommendation Method Picker        โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Have item features? โ†’ Content-Based
Have user behavior data? โ†’ Collaborative
Have both? โ†’ Hybrid โœ…

Social platform? โ†’ Add social signals
Need explainability? โ†’ Content-Based
Want serendipity? โ†’ Collaborative

Cold start problem? โ†’ Content-Based first,
                      then Collaborative

Popular approach: Weighted Hybrid
= (Weight ร— Content) + (Weight ร— Collab) + 
  (Weight ร— Social) + (Weight ร— Recency)

You now understand recommendation systems from first principles! ๐ŸŽ“

Next steps:

  1. Re-read sections that were unclear
  2. Draw diagrams to visualize concepts
  3. Work through more examples on paper
  4. Apply to your Nexgate platform design

Remember: The best recommendation system is one that works for YOUR specific use case and users! ๐Ÿš€