Nexgate Recommendation System - Architecture Documentation
- Nexgate Recommendation System - Architecture Documentation
- Understanding Recommendation Systems - From Zero to Hero ๐
Nexgate Recommendation System - Architecture Documentation
๐ Table of Contents
- Executive Summary
- System Overview
- Recommendation Strategies
- Embedding System
- Feed Algorithm
- Search System
- Group Buy Recommendations
- Installment Recommendations
- Technology Stack
- Implementation Phases
- Performance & Scaling
- Metrics & Monitoring
๐ฏ Executive Summary
What is Nexgate?
- E-commerce marketplace
- Social media feed experience
- Group buying (collective purchasing)
- Installment payment options
- Community engagement
Core Philosophy
"Discovery over Search"
- Users discover products through personalized feed (primary)
- Search is secondary enhancement feature
- Social proof drives purchasing decisions
- Affordability through group buying and installments
Recommendation System Goals
- Maximize Engagement: Keep users scrolling and interacting
- Drive Conversions: Turn views into purchases
- Build Community: Connect buyers through group purchases
- Enable Affordability: Help users buy premium products through installments
- Provide Value: Show relevant products at the right time
Key Success Metrics
- Feed Engagement Rate: Target > 5% (likes, comments, shares per view)
- Click-Through Rate: Target > 8% (feed โ product page)
- Conversion Rate: Target > 3% (view โ purchase)
- Average Session Time: Target > 10 minutes
- Group Buy Completion Rate: Target > 70% of groups reach goal
- Installment Adoption: Target > 40% of purchases over $500
๐๏ธ System Overview
High-Level Architecture
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ USER INTERFACE โ
โ โ
โ โโโโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโโโ โ
โ โ Feed โ โ Product โ โ Search โ โ Group โ โ
โ โ (Main) โ โ Detail โ โ (Text/ โ โ Buy โ โ
โ โ โ โ Page โ โ Visual) โ โ Deals โ โ
โ โโโโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโโโ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โ REST API / WebSocket
โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ SPRING BOOT BACKEND (Main) โ
โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ RECOMMENDATION ENGINE LAYER โ โ
โ โ โ โ
โ โ โโโโโโโโโโโโโ โโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โ โ
โ โ โ Feed โ โ Search โ โ Group Buy โ โ โ
โ โ โ Service โ โ Service โ โ Service โ โ โ
โ โ โโโโโโโโโโโโโ โโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โ โ
โ โ โ โ
โ โ โโโโโโโโโโโโโ โโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โ โ
โ โ โSimilarity โ โInstallmentโ โ Engagement โ โ โ
โ โ โ Service โ โ Service โ โ Tracker โ โ โ
โ โ โโโโโโโโโโโโโ โโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ CORE BUSINESS SERVICES โ โ
โ โ โ โ
โ โ Product Service | User Service | Order Service โ โ
โ โ Social Graph | Embedding Client | Payment Service โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโ
โ โ โ
โ โ โ
โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโ
โ PostgreSQL โ โ Redis โ โ Python โ
โ + pgvector โ โ (Cache) โ โ Embedding โ
โ โ โ โ โ Service โ
โ โข Products โ โ โข Sessions โ โ โ
โ โข Users โ โ โข Feed โ โ โข Sentence โ
โ โข Orders โ โ โข Seen Postsโ โ Transformers โ
โ โข Posts โ โ โข Hot Data โ โ โข CLIP Model โ
โ โข Embeddings โ โ โ โ โ
โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโ
Component Responsibilities
Frontend (Mobile/Web):
- Display personalized feed
- Handle user interactions (like, share, comment)
- Upload images for visual search
- Show group buy status and timers
- Display installment options
Spring Boot Backend:
- Core business logic
- Recommendation engine orchestration
- API endpoints
- Real-time notifications
- Payment processing
PostgreSQL + pgvector:
- Primary data storage
- Vector similarity search
- Transactional data
- User/product relationships
Redis:
- Session management
- Feed caching (short-lived)
- Seen posts tracking
- Real-time counters
- Rate limiting
Python Embedding Service:
- Text embedding generation
- Image embedding generation
- Runs as separate microservice
- Models stay loaded in memory
๐ Recommendation Strategies Overview
Strategy Matrix
| Feature | Strategy | Uses Embeddings | Complexity | Priority |
|---|---|---|---|---|
| Main Feed | Hybrid Scoring | No | Medium | P0 (Critical) |
| Product Similarity | Vector Similarity | Yes (Text + Image) | High | P1 (Important) |
| Text Search | Semantic Matching | Yes (Text) | High | P1 (Important) |
| Visual Search | Image Matching | Yes (Image) | High | P2 (Nice to have) |
| Group Buy Match | Interest + Urgency | Partial | Medium | P0 (Critical) |
| Group Invites | Social Graph | No | Low | P1 (Important) |
| Installment Suggest | Budget Analysis | No | Low | P0 (Critical) |
| Trending | Engagement Metrics | No | Low | P0 (Critical) |
| Collaborative | Purchase Patterns | No | Medium | P1 (Important) |
When to Use What
Use Scoring System (No Embeddings):
- Main feed personalization
- Group buy recommendations
- Installment suggestions
- Trending content
- Social recommendations (follows, friends)
Use Embeddings:
- "Similar products" feature
- Semantic text search
- Visual search (upload image)
- Style/aesthetic matching
Use Collaborative Filtering:
- "Frequently bought together"
- "Customers also viewed"
- Cross-sell recommendations
- Friend activity
- Network purchasing patterns
- Group buy invitations
- Influencer recommendations
๐จ Detailed Recommendation Strategies
1. Main Feed Algorithm (Primary Experience)
Purpose: Personalized social commerce feed (like TikTok Shop/Instagram Shopping)
Composition Strategy:
Feed Mix (Per 20 posts):
โโโ 60% Posts from Following (12 posts)
โ โข Sellers user follows
โ โข Friends' posts
โ โข Brands user engaged with
โ
โโโ 25% Trending in User's Categories (5 posts)
โ โข Hot products in categories user likes
โ โข Viral posts with high engagement
โ โข New products gaining traction
โ
โโโ 10% Local Sellers (2 posts)
โ โข Sellers in same city/region
โ โข Faster delivery options
โ โข Support local businesses
โ
โโโ 5% Sponsored/Promoted (1 post)
โข Paid advertising
โข Featured products
โข Platform partnerships
Scoring Formula:
Each post receives a score from 0-100 based on weighted factors:
Post Score =
(35% ร Social Score) +
(25% ร Engagement Prediction Score) +
(20% ร Personalization Score) +
(15% ร Recency Score) +
(5% ร Quality Score)
Factor Breakdown:
Following Seller: +15 points
Friends Engaged (per friend): +2 points (max 10)
Seller Verified: +5 points
Seller Rating (0-5 stars): +0 to 5 points
B. Engagement Prediction Score (25 points max)
Category Match (user's top 3 categories): +10 points
Price in User's Range: +8 points
Similar to Past Likes: +7 points
C. Personalization Score (20 points max)
Matches Recent Searches (per keyword): +2 points (max 8)
Local Seller (same city): +7 points
Matches User Preferences: +5 points
D. Recency Score (15 points max)
< 1 hour old: 15 points
< 6 hours old: 12 points
< 24 hours old: 8 points
< 3 days old: 4 points
Older: 1 point
E. Quality Score (5 points max)
3+ Images: +2 points
Detailed Description (>100 chars): +1 point
High Engagement Rate (>10%): +2 points
Feed Generation Process:
Step 1: Gather Candidates
โโโ Query posts from following (last 7 days)
โโโ Query trending posts in user's categories
โโโ Query local sellers' posts
โโโ Query sponsored posts
โ Result: ~100-200 candidate posts
Step 2: Filter
โโโ Remove posts user already saw (Redis cache)
โโโ Remove out-of-stock products
โโโ Remove blocked/reported sellers
โโโ Remove duplicate products
โ Result: ~80-150 eligible posts
Step 3: Score Each Post
โโโ Calculate social score
โโโ Calculate engagement prediction
โโโ Calculate personalization
โโโ Calculate recency
โโโ Calculate quality
โ Result: Each post has score 0-100
Step 4: Rank & Mix
โโโ Sort by score (highest first)
โโโ Apply diversity rules:
โ โข Max 3 consecutive posts from same seller
โ โข Max 5 posts from same category in 20
โ โข Insert 1 sponsored post at position 7-10
โโโ Apply freshness shuffle (boost 2-3 recent high-quality posts)
โ Result: Final ranked list
Step 5: Pagination
โโโ Return top 20 posts
โโโ Generate cursor (ID of 20th post)
โโโ Cache seen post IDs (Redis, 30 days TTL)
โโโ Track impressions for analytics
Optimization Strategies:
Caching:
- User's following list: Redis, 1 hour TTL
- User's interested categories: Redis, 6 hours TTL
- Trending posts: Redis, 5 minutes TTL
- User's seen posts: Redis, 30 days TTL
Pre-computation:
- Trending posts calculated every 5 minutes (background job)
- User interest profiles updated daily (background job)
- Seller reputation scores updated hourly
Feed Diversity:
Diversity Rules:
โข No more than 3 consecutive posts from same seller
โข Category distribution: At least 3 different categories in top 10
โข Price variance: Mix of low/mid/high price points
โข Content type mix: Products, tutorials, reviews, testimonials
2. Product Similarity (Vector Embeddings)
Purpose: Show similar products on product detail page
Where Used:
- Product detail page "Similar Products" section
- "You might also like" recommendations
- Alternative suggestions when out of stock
How It Works:
Text-Based Similarity:
Product: "Red Nike Running Shoes"
โ
Generate text embedding from:
"Red Nike Running Shoes. Comfortable athletic footwear
designed for daily training and jogging. Category: Footwear."
โ
Embedding: [0.23, 0.87, 0.12, ... 384 numbers]
โ
Store in product.text_embedding column
โ
When user views product:
โ Compare this embedding with all other products
โ Find 10 closest matches using cosine similarity
โ Display as "Similar Products"
Image-Based Similarity:
Product Image: [actual shoe photo]
โ
Download image and process pixels
โ
Generate image embedding using CLIP:
[0.91, 0.12, 0.45, ... 512 numbers]
โ
Store in product.image_embedding column
โ
When user views product:
โ Compare this embedding with all other products
โ Find 10 visually similar matches
โ Display as "Visually Similar"
Combined Similarity:
Option to blend both:
Combined Score = (0.6 ร Text Similarity) + (0.4 ร Image Similarity)
Example:
Product A vs Product B:
โข Text similarity: 0.85 (both describe athletic shoes)
โข Image similarity: 0.72 (both look sporty)
โข Combined: (0.6 ร 0.85) + (0.4 ร 0.72) = 0.51 + 0.29 = 0.80
Result: 80% similarity โ Good recommendation!
When to Use Which:
Text Embedding (Semantic Similarity):
- User needs products with similar function
- Finding alternatives/substitutes
- Matching by features and description
- Example: "wireless earbuds" โ finds AirPods, Galaxy Buds, etc.
Image Embedding (Visual Similarity):
- User cares about style/appearance
- Fashion and design-focused products
- Color and aesthetic matching
- Example: Minimalist white sneakers โ finds all similar looking white shoes
Performance Considerations:
Similarity Search Performance:
โโโ Without index: ~2-5 seconds (scan all products)
โโโ With ivfflat index: ~50-200ms (much faster!)
โโโ With caching: ~10-30ms (best!)
Optimization:
โข Create pgvector index on embedding columns
โข Cache similar products per product (24 hour TTL)
โข Pre-compute similarities for popular products
โข Limit search to same category first (faster)
3. Search System
Purpose: Help users find products through text or image queries
A. Text Search (Semantic)
Traditional Keyword Search (What we're improving):
User searches: "comfortable work shoes"
โ
System finds: Products containing words "comfortable" AND "work" AND "shoes"
โ
Problems:
โข Misses "office footwear" (different words, same meaning)
โข Misses "business casual sneakers" (related but no exact match)
โข Can't understand intent or context
Semantic Search (With Embeddings):
User searches: "comfortable work shoes"
โ
Generate embedding for query:
"comfortable work shoes" โ [0.45, 0.78, 0.23, ...]
โ
Compare with ALL product text embeddings
โ
Find closest matches:
โข "Business casual sneakers" โ
(similarity: 0.89)
โข "Office-appropriate footwear" โ
(similarity: 0.86)
โข "Professional dress shoes" โ
(similarity: 0.82)
โ
Understands meaning, not just keywords!
Search Process:
Step 1: User Types Query
"gift for dad who likes tech"
โ
Step 2: Generate Query Embedding
Call Python service: "gift for dad who likes tech"
โ Returns: [0.34, 0.67, 0.91, ... 384 numbers]
โ
Step 3: Similarity Search
Compare query embedding with product text embeddings
Use cosine similarity to find closest matches
โ
Step 4: Ranking & Filtering
โข Base ranking: Similarity score
โข Boost: In stock items (+10%)
โข Boost: Popular items (+5%)
โข Boost: New arrivals (+5%)
โข Filter: Remove out of budget (if known)
โ
Step 5: Return Results
Top 20 most relevant products
Search Enhancements:
Query Understanding:
โโโ Intent detection: "gift" โ Show gift-appropriate items
โโโ Context: "for dad" โ Skew towards masculine products
โโโ Category hint: "tech" โ Filter to electronics/gadgets
โโโ Budget inference: Previous searches/purchases
Ranking Factors:
โโโ Semantic relevance (primary): 60%
โโโ Popularity/sales: 20%
โโโ Recency (new products): 10%
โโโ User's past preferences: 10%
B. Visual Search (Image Upload)
Purpose: Find products by uploading a photo
How It Works:
Step 1: User Sees Product in Real Life
User sees cool shoes on Instagram/street
Takes photo or saves image
โ
Step 2: Upload to Nexgate
User opens app โ Camera icon โ Upload photo
โ
Step 3: Generate Image Embedding
Send image bytes to Python CLIP service
Process actual pixels (not URL or filename!)
โ Returns: [0.91, 0.23, 0.67, ... 512 numbers]
โ
Step 4: Find Visually Similar Products
Compare upload embedding with all product image embeddings
Use cosine similarity
โ
Step 5: Return Results
"Found 20 similar items"
Show visually matching products
Include similarity % for transparency
Visual Search Use Cases:
Use Case 1: "Find This Exact Product"
User uploads: Photo of specific Nike shoe
System finds: That exact shoe (if in catalog)
Or: Visually identical alternatives
Use Case 2: "Find Similar Style"
User uploads: Minimalist white sneakers
System finds: All minimal white shoe styles
From different brands
Use Case 3: "Shop the Look"
User uploads: Full outfit photo
System identifies: Individual items
Shows: Matching products for each piece
Use Case 4: "Find Better Price"
User uploads: Expensive designer bag
System finds: Similar looking bags
At various price points
Performance:
Visual Search Speed:
โโโ Image upload: ~1 second
โโโ Embedding generation: ~200-300ms
โโโ Similarity search: ~100-200ms (with index)
โโโ Total: ~1.5 seconds (acceptable!)
Optimization:
โข Compress uploaded images (reduce transfer time)
โข Process images at consistent size (224x224 for CLIP)
โข Cache embeddings for popular uploaded images
โข Use GPU for embedding generation (5x faster)
4. Group Buy Recommendations
Purpose: Connect users with active group purchases and maximize group completion rates
A. Find Groups to Join
Scenario:
User viewing: iPhone 15 ($999)
User thinking: "Too expensive..."
โ
System shows:
"๐ก Join a group buy and save $150!
๐ฅ 3 active iPhone groups:
โข Group A: 8/10 people, 3 hours left โ JOIN
โข Group B: 5/10 people, 24 hours left โ JOIN
โข Group C: 2/10 people, 48 hours left โ JOIN"
Recommendation Logic:
Finding Relevant Groups:
Step 1: Identify User's Interest
User action: Viewed/searched/added to cart iPhone
Interest level: High
โ
Step 2: Find Active Groups
Query all groups:
โข Product: iPhone 15 (exact match)
โข Status: Active (not expired/completed)
โข Not full yet (has open spots)
โ
Step 3: Score Each Group
Group Score =
(0.40 ร Product Match) +
(0.30 ร Urgency/Time Left) +
(0.20 ร Fill Rate) +
(0.10 ร Discount Size)
Example:
Group A:
โข Product Match: 100% (exact iPhone)
โข Urgency: 90% (3 hours left = high urgency)
โข Fill Rate: 80% (8/10 = almost there!)
โข Discount: 15% ($150 savings)
Score = (0.40ร100) + (0.30ร90) + (0.20ร80) + (0.10ร15)
= 40 + 27 + 16 + 1.5 = 84.5 โญ TOP RECOMMENDATION!
Group C:
โข Product Match: 100%
โข Urgency: 20% (48 hours = low urgency)
โข Fill Rate: 20% (2/10 = just started)
โข Discount: 15%
Score = (0.40ร100) + (0.30ร20) + (0.20ร20) + (0.10ร15)
= 40 + 6 + 4 + 1.5 = 51.5
โ
Step 4: Rank & Display
Show Group A first (highest score + urgency)
Display countdown timer
Show social proof (8 people already joined!)
Similar Product Groups (Using Embeddings):
Scenario: User wants iPhone but no active iPhone groups
System finds:
"No active iPhone groups, but:
๐ฑ Samsung S24 Group: 6/10, 5 hours left โ JOIN
๐ฑ Google Pixel Group: 4/8, 12 hours left โ JOIN"
How: Use text embeddings to find similar products
iPhone embedding โ Samsung/Google phone embeddings
Show groups for similar high-end smartphones
Urgency Triggers:
Smart Notifications:
When group is 80% full:
โ "โก Only 2 spots left in iPhone group! Join now!"
When group has < 6 hours left:
โ "โฐ Group closes in 5 hours! Don't miss 15% off!"
When user's friend joins:
โ "๐ค Sarah just joined iPhone group! Join her?"
When price drops to user's budget:
โ "๐ฐ Now $849 with group buy - fits your budget!"
B. Invite Friends to Your Group
Scenario:
User creates group: "MacBook Pro - Need 10 people"
Current: 3/10 members (user + 2 others)
โ
System suggests who to invite
Invitation Recommendation Logic:
Finding Best Invitees:
Step 1: Candidate Pool
โโโ User's followers
โโโ User's friends (mutual follows)
โโโ People user previously bought with
โโโ Active users who viewed similar products
Step 2: Score Each Potential Invitee
Invite Score =
(0.35 ร Product Interest) +
(0.25 ร Social Connection) +
(0.20 ร Past Group Activity) +
(0.15 ร Budget Match) +
(0.05 ร Platform Activity)
Example - Inviting John:
โข Product Interest: 90% (viewed MacBooks 3 times this month)
โข Social Connection: 80% (mutual friend, engaged with user's posts)
โข Past Group Activity: 70% (joined 2 groups before, completed both)
โข Budget Match: 85% (bought items in $800-1500 range)
โข Platform Activity: 95% (logs in daily, active buyer)
Score = (0.35ร90) + (0.25ร80) + (0.20ร70) + (0.15ร85) + (0.05ร95)
= 31.5 + 20 + 14 + 12.75 + 4.75 = 83 โญ EXCELLENT MATCH!
โ
Step 3: Personalized Invitation Message
"Invite John:
๐ก He viewed MacBook Pro recently
๐ฅ Mutual friend with Sarah (already in group)
โญ Reliable (completed 2 past groups)"
[SEND INVITE]
Smart Messaging:
To close friend:
"Hey! I'm buying a MacBook with 9 others. Join us?
We save $200 each! Only 1 spot left."
To network connection:
"Group buy for MacBook Pro
3 of your friends already joined
Save 15% | 6 spots left"
To stranger (similar interests):
"MacBook Pro group buy
Popular in tech community
Join 9 others | Save $200"
C. Complete the Group (Urgency Marketing)
Scenario:
Group status: 8/10 members, 4 hours left
Need: 2 more people to unlock deal
Risk: Group expires without completion
Completion Strategy:
System Actions:
Action 1: Broadcast to High-Intent Users
Target:
โโโ Users who viewed this product
โโโ Users with product in cart
โโโ Users who joined similar groups
โโโ Friends of current members
Notification:
"โก URGENT: 2 spots left!
iPhone group closes in 4 hours
Join now and save $150"
Action 2: Discount Boost (If needed)
If group stuck at 80% for 2+ hours:
โ Platform adds extra 5% discount
โ "New deal: Save $150 โ Save $170!"
โ Creates momentum
Action 3: Social Pressure
Show current members:
"Your group needs 2 more people
Share with friends to complete deal
[SHARE ON WHATSAPP] [SHARE ON INSTAGRAM]"
Action 4: Fallback Options
If group fails:
โ "Sorry, group didn't complete
๐ก Join this similar group instead: [...]
OR: Get notified when new iPhone group starts"
Group Recommendation Priorities:
Priority Matrix:
Urgent + Almost Full = Highest Priority
โโโ 90% full + < 6 hours left
โโโ Show on feed prominently
โโโ Push notifications
โโโ Email reminders
High Interest Match = High Priority
โโโ Exact product user wants
โโโ Similar products (embeddings)
โโโ Category match
โโโ Show in search results
Social Connection = Medium Priority
โโโ Friends in group
โโโ Followed sellers
โโโ Past group partners
โโโ Show in "Friends' Activity"
General Discovery = Low Priority
โโโ Trending groups
โโโ Category exploration
โโโ New group announcements
โโโ Show in feed occasionally
5. Installment Recommendations
Purpose: Make expensive products affordable through payment plans
Key Principle: Show the right payment option at the right moment
A. "You Can Afford This!"
Scenario:
User profile:
โข Average purchase: $50-200
โข Max purchase: $400
โข Never bought > $600
User views: MacBook Pro ($1,299)
Typical reaction: "Too expensive, can't afford"
Installment Intervention:
System detects:
Product price ($1,299) > User max purchase ($400) ร 3
โ
Calculate monthly payment:
$1,299 รท 12 months = $108/month
โ
Check affordability:
$108/month < User max purchase ($400)? โ
YES
โ
Show prominent banner:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ ๐ก You can afford this! โ
โ Pay just $108/month for 12 months โ
โ โข No interest โ
โ โข Pay ahead anytime โ
โ โข Cancel anytime โ
โ [SEE PAYMENT OPTIONS] โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Affordability Logic:
Trigger Rules:
Show Installment When:
1. Product price > (User's average purchase ร 3)
2. Monthly payment < (User's max purchase ร 1.5)
3. Product in user's interested categories
4. User has good payment history (if existing customer)
Example:
Product: $1,299
User avg: $150
User max: $400
Check 1: $1,299 > ($150 ร 3) = $1,299 > $450 โ
Check 2: $108/month < ($400 ร 1.5) = $108 < $600 โ
โ SHOW INSTALLMENT OPTION!
Don't Show If:
โข Product too cheap (< $200)
โข User can afford full price easily
โข Monthly payment still too high for user
Payment Plan Recommendations:
Personalized Plans:
For Budget-Conscious User:
"12 months at $108/month" (smallest monthly)
For Established User:
"6 months at $216/month" (faster payoff)
For Premium User:
"3 months at $433/month" (minimize interest duration)
Dynamic Suggestion:
Based on user's typical payment speed:
โข Pays bills early โ Suggest shorter term
โข Budget constrained โ Suggest longer term
โข First-time โ Suggest flexible middle option
B. Premium Product Feed
Scenario:
User interested in: Laptops
User budget: $300-500
Problem: Premium laptops cost $1000+
Solution - Installment Feed Section:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ ๐ Premium Products - Easy Payments โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ MacBook Pro โ
โ $1,299 โ Just $108/month โ
โ โญ 4.9 stars | ๐ฅ Hot deal โ
โ "Fits your monthly budget!" โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ Dell XPS 15 โ
โ $1,199 โ Just $100/month โ
โ โญ 4.8 stars | ๐ผ Professional โ
โ "Popular in your network" โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Feed Scoring for Installment Products:
Installment Product Score =
(0.40 ร Category Interest) +
(0.30 ร Affordability with Installment) +
(0.20 ร Product Quality) +
(0.10 ร Social Proof)
Example - MacBook for User:
โข Category: 95% (loves tech, views laptops often)
โข Affordability: 90% ($108/month fits budget)
โข Quality: 98% (4.9 star rating, premium brand)
โข Social: 75% (3 friends own MacBooks)
Score = (0.40ร95) + (0.30ร90) + (0.20ร98) + (0.10ร75)
= 38 + 27 + 19.6 + 7.5 = 92.1 โญโญโญ
This scores higher than cheaper laptops because:
โข Installments make it affordable
โข Matches user's aspirations
โข Higher quality product
C. Upgrade Suggestions
Scenario:
User adding to cart: iPhone 15 ($799)
Better option exists: iPhone 15 Pro ($999)
Difference: $200
Upgrade Recommendation:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ ๐ก Upgrade to iPhone 15 Pro? โ
โ โ
โ Current choice: $799 โ
โ iPhone 15 Pro: $999 (+$200) โ
โ โ
โ With 12-month plan: โ
โ โข Standard: $67/month โ
โ โข Pro: $83/month (+$16) โ
โ โ
โ Pro benefits: โ
โ โข Better camera system โ
โ โข Titanium design โ
โ โข More storage โ
โ โ
โ [UPGRADE FOR JUST $16/MONTH] โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Upgrade Logic:
When to Suggest Upgrade:
Condition 1: Premium version exists
iPhone 15 โ iPhone 15 Pro โ
Condition 2: Monthly difference is small
Standard: $67/month
Pro: $83/month
Difference: $16/month (< $20) โ
Condition 3: User can afford it
User's budget allows +$16/month โ
Condition 4: Meaningful upgrade
Pro version has substantial improvements โ
โ SHOW UPGRADE SUGGESTION!
Don't Suggest If:
โข Difference > $30/month (too much)
โข Upgrade is minimal (not worth it)
โข User explicitly chose budget option
โข User on tightest plan already
D. Bundle Recommendations
Scenario:
User purchased: MacBook Pro ($1,299)
Payment plan: $108/month for 12 months
Bundle Suggestion:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ ๐ฆ Complete Your Setup โ
โ โ
โ Add to your payment plan: โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ ๐ฑ๏ธ Magic Mouse โ
โ $79 โ +$7/month โ
โ New total: $115/month โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โจ๏ธ Magic Keyboard โ
โ $99 โ +$8/month โ
โ New total: $123/month โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ ๐ผ Laptop Sleeve โ
โ $49 โ +$4/month โ
โ New total: $112/month โ
โ โ
โ ๐ก Still within your budget! โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Bundle Logic:
Accessory Recommendation Rules:
Rule 1: Complementary Products
Main: MacBook
Suggest: Mouse, keyboard, bag, USB-C accessories
(Use collaborative filtering: "bought together")
Rule 2: Affordable Addition
Current: $108/month
User budget: Up to $200/month
Available: $92/month for accessories
โ Suggest items totaling < $90/month
Rule 3: Prioritize by Value
High priority:
โข Essential (mouse, charger)
โข High satisfaction (4.5+ stars)
โข Popular (many bought together)
Low priority:
โข Optional accessories
โข Lower rated items
โข Expensive add-ons
Rule 4: Convenience Factor
"Add all 3 accessories for +$19/month
Save time, get complete setup!"
E. Payment Flexibility Highlights
Key Feature: Users can pay ahead on installments
Recommendation Angle:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ ๐ฐ Flexible Payments โ
โ โ
โ Start at $108/month โ
โ โข Got a bonus? Pay extra anytime โ
โ โข Finish early, save on duration โ
โ โข No penalties for early payment โ
โ โข Pause if needed (1 month grace) โ
โ โ
โ "Pay at your own pace!" โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
When to Highlight:
For New Users:
"Flexible payment plans available
Pay early, pay later - you choose"
For Hesitant Users:
"Not sure? Start with small payments
You can always pay more when ready"
For Seasonal Workers:
"Pay extra during busy season
Reduce payments during slow months"
For Goal-Oriented Users:
"Finish your payments early
Track progress in dashboard"
6. Combined Recommendations (Group Buy + Installments)
The Ultimate Deal Strategy
Scenario:
Product: MacBook Pro
Regular price: $1,299
Group buy discount: $200
Installment option: Available
Combined Offer:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ ๐ฏ BEST DEAL COMBO! โ
โ โ
โ MacBook Pro โ
โ Regular: $1,299 โ
โ โ
โ ๐ค Join Group: $1,099 (Save $200!) โ
โ ๐ณ + Pay Monthly: $92/month โ
โ โ
โ ๐ฅ 7/10 people | โฐ 6 hours left โ
โ โ
โ โจ Best combo deal: โ
โ โข Save $200 with group โ
โ โข Just $92/month affordable โ
โ โข No interest โ
โ โ
โ [JOIN GROUP + START PAYMENTS] โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Combined Score Formula:
Ultimate Deal Score =
(0.25 ร Category Interest) +
(0.25 ร Affordability Boost) +
(0.20 ร Group Discount Size) +
(0.15 ร Social Proof) +
(0.10 ร Urgency) +
(0.05 ร Past Group Success)
Example - MacBook for User:
โข Category: 95% (loves tech)
โข Affordability: 95% ($92/month from $108!)
โข Discount: 100% ($200 is huge savings)
โข Social: 70% (7 people joined)
โข Urgency: 90% (6 hours left!)
โข Past: 80% (completed 2 groups)
Score = (0.25ร95) + (0.25ร95) + (0.20ร100) +
(0.15ร70) + (0.10ร90) + (0.05ร80)
= 23.75 + 23.75 + 20 + 10.5 + 9 + 4
= 91 โญโญโญ MAXIMUM APPEAL!
Why Combined Works:
Psychological Triggers:
1. Double Savings:
"Save $200 + Pay less monthly = Win-Win"
2. Urgency:
"Group closes in 6 hours! Act now!"
3. Social Proof:
"7 others already in! Don't miss out!"
4. Affordability:
"$92/month fits your budget easily"
5. Low Risk:
"Flexible payments + Group guarantee"
Conversion Multiplier:
โข Group buy alone: 8% conversion
โข Installment alone: 15% conversion
โข COMBINED: 25-30% conversion! ๐
Feed Placement:
Priority Positioning:
Top of Feed:
โข Active group + installment combos
โข Closing soon (<6 hours)
โข User's interested categories
Mid-Feed:
โข New group + installment deals
โข Popular combos (high join rate)
โข Category exploration
Bottom Feed:
โข General group buy awareness
โข Installment education
โข Success stories
๐ค Embedding System Deep Dive
What Are Embeddings? (Conceptual Understanding)
Simple Analogy:
Think of embeddings as "coordinates" for meaning:
Words/Images โ Machine Learning Model โ Numbers (coordinates)
Similar meanings โ Similar coordinates โ Close together in space
Example in 2D (real embeddings are 384 or 512 dimensions!):
"Red Shoes" โ Point (0.8, 0.2)
"Crimson Sneakers" โ Point (0.82, 0.18) โ Close!
"Blue Laptop" โ Point (0.1, 0.9) โ Far away!
Distance between "Red Shoes" and "Crimson Sneakers" is small
โ They're similar!
Architecture of Embedding System
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ SPRING BOOT BACKEND โ
โ โ
โ Need embedding for product/search query? โ
โ โ โ
โ HTTP REST Call to Python Service โ
โ POST /embed-text or POST /embed-image โ
โ Body: {text: "..."} or {image_bytes: "..."} โ
โโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โ HTTP Request
โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ PYTHON EMBEDDING SERVICE โ
โ (Flask Microservice) โ
โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ Models Loaded in Memory (at startup):โ โ
โ โ โ โ
โ โ โข Sentence Transformers โ โ
โ โ (all-MiniLM-L6-v2) โ โ
โ โ โ Text embeddings (384 dims) โ โ
โ โ โ โ
โ โ โข CLIP โ โ
โ โ (openai/clip-vit-base-patch32) โ โ
โ โ โ Image embeddings (512 dims) โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ
โ Receives request โ Processes โ Returns arrayโ
โโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โ Response: [0.23, 0.87, ...]
โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ SPRING BOOT BACKEND โ
โ โ
โ Receives embedding array โ
โ โ โ
โ Stores in PostgreSQL (vector column) โ
โ OR: Uses for similarity search โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
When Embeddings Are Generated
Trigger 1: Product Creation/Update
Flow:
Admin adds product
โ Product saved to database (ID assigned)
โ Background async job triggered
โ Generate text embedding (name + description)
โ Generate image embedding (from image URL)
โ Update product record with both embeddings
โ Product now searchable by similarity!
Timeline:
โข Product creation: Immediate (< 100ms)
โข Embedding generation: Background (2-5 seconds)
โข Total user-facing delay: None (async)
Trigger 2: Search Query
Flow:
User types search query
โ Frontend sends to backend
โ Backend calls Python service
โ Python generates query embedding
โ Backend compares with product embeddings
โ Returns similar products
โ Frontend displays results
Timeline:
โข User types โ Results: ~500-800ms
โข Embedding generation: ~50-100ms
โข Search: ~100-200ms
โข Network + rendering: ~300-500ms
Trigger 3: Image Upload (Visual Search)
Flow:
User uploads image
โ Frontend sends image bytes
โ Backend calls Python service
โ Python generates image embedding
โ Backend finds similar product images
โ Returns visually similar products
Timeline:
โข Image upload: ~500ms-1s (depends on size/network)
โข Embedding generation: ~200-400ms
โข Search: ~100-200ms
โข Total: ~1-2 seconds (acceptable for visual search)
Trigger 4: Batch Processing
For existing products without embeddings:
Flow:
Admin triggers batch job
โ Fetch products in batches (100 at a time)
โ Generate embeddings for batch
โ Save all embeddings
โ Repeat for next batch
Timeline:
โข 100 products: ~10-15 seconds
โข 10,000 products: ~15-20 minutes
โข Run once, or nightly for new products
Embedding Storage
Data Stored:
Product Table includes:
โโโ id (primary key)
โโโ name, description, price, etc.
โโโ text_embedding (vector type, 384 dimensions)
โโโ image_embedding (vector type, 512 dimensions)
Each embedding stored as array of floats:
text_embedding: [0.234, 0.876, 0.123, ... 384 numbers]
image_embedding: [0.912, 0.234, 0.567, ... 512 numbers]
Storage per product:
โข Text: 384 floats ร 4 bytes = 1.5 KB
โข Image: 512 floats ร 4 bytes = 2.0 KB
โข Total: 3.5 KB per product
For 10,000 products: 35 MB (tiny!)
For 1,000,000 products: 3.5 GB (manageable!)
Similarity Search Process
How Similarity Is Calculated:
Cosine Similarity Formula:
similarity = (A ยท B) / (||A|| ร ||B||)
Where:
A ยท B = dot product (sum of multiplying each pair)
||A|| = magnitude of vector A
||B|| = magnitude of vector B
Result: Number between 0 (completely different) and 1 (identical)
Example:
Product A: [0.8, 0.6]
Product B: [0.9, 0.5]
A ยท B = (0.8 ร 0.9) + (0.6 ร 0.5) = 0.72 + 0.30 = 1.02
||A|| = โ(0.8ยฒ + 0.6ยฒ) = โ1.00 = 1.0
||B|| = โ(0.9ยฒ + 0.5ยฒ) = โ1.06 = 1.03
similarity = 1.02 / (1.0 ร 1.03) = 0.99 (very similar!)
Search Performance:
Without Index:
โข Compares with every product
โข 10,000 products: ~2-5 seconds
โข 100,000 products: ~20-50 seconds
โข NOT acceptable!
With pgvector Index (ivfflat):
โข Uses approximate nearest neighbor (ANN)
โข Groups similar vectors together
โข Only searches relevant groups
โข 10,000 products: ~50-100ms
โข 100,000 products: ~100-300ms
โข 1,000,000 products: ~200-500ms
โข Acceptable! โ
Trade-off:
โข 100% accuracy vs 95-99% accuracy
โข Worth it for speed improvement
Model Selection
Text Embeddings:
Model: all-MiniLM-L6-v2 (Sentence Transformers)
Pros:
โ
Fast (50ms per embedding on CPU)
โ
Small size (90 MB model)
โ
Good quality for e-commerce
โ
384 dimensions (manageable)
โ
Free and open source
Use for:
โข Product descriptions
โข Search queries
โข Category matching
โข Semantic similarity
Alternative if needed:
โข all-mpnet-base-v2 (better quality, slower, 768 dims)
โข multilingual models (if serving multiple countries)
Image Embeddings:
Model: CLIP (openai/clip-vit-base-patch32)
Pros:
โ
Understands both images AND text
โ
Can search images with text!
โ
Good visual similarity
โ
512 dimensions
โ
Free and open source
Use for:
โข Product images
โข Visual search
โข Style matching
โข "Shop the look"
Unique Feature:
Can compare text with images!
Query: "red shoes" (text embedding)
Find: Red shoe products (image embedding)
Cross-modal search! ๐ฅ
Optimization Strategies
Caching:
Strategy 1: Cache Similar Products
โข After computing similarities, cache results
โข TTL: 24 hours (products don't change often)
โข Key: product_id
โข Value: [similar_product_ids]
Strategy 2: Cache Query Embeddings
โข Common searches generate same embeddings
โข Cache: "red shoes" โ [0.23, 0.87, ...]
โข TTL: 1 hour
โข Saves Python service calls
Strategy 3: Cache Popular Product Embeddings
โข Keep hot product embeddings in Redis
โข Faster than PostgreSQL lookup
โข Updates when product changes
Batch Processing:
Instead of:
โข Generate 1 embedding โ 50ms
โข Generate 100 embeddings โ 5 seconds (sequential)
Better:
โข Generate 100 embeddings in batch โ 2 seconds!
โข 2.5x faster by batching
Use for:
โข Initial product catalog population
โข Nightly updates
โข Bulk imports
Index Tuning:
pgvector index parameters:
lists: Number of clusters
โข More lists = faster search, less accuracy
โข Fewer lists = slower search, more accuracy
โข Sweet spot: โ(number of products)
โข 10,000 products โ lists = 100
โข 100,000 products โ lists = 316
Recommendation:
Start with lists = 100
Monitor search speed and accuracy
Adjust if needed
๐ฑ Feed Algorithm (Complete Flow)
Feed Generation Process (Step-by-Step)
User Opens App โ Request Feed
REQUEST:
GET /api/feed?limit=20
HEADERS:
Authorization: Bearer {user_token}
Backend Process:
Step 1: User Profile Loading (< 50ms)
Check Redis cache:
โข User's following list (Key: user:{id}:following)
โข User's interested categories (Key: user:{id}:categories)
โข User's seen posts (Key: user:{id}:seen_posts)
If not cached:
โข Query PostgreSQL
โข Store in Redis (TTL: 1 hour for following, 6 hours for categories)
Step 2: Candidate Gathering (< 200ms)
Parallel Queries (executed simultaneously):
Query A: Posts from Following
โข Get posts from last 7 days
โข From sellers user follows
โข Not seen by user
โข In stock
โ Returns ~40-60 posts
Query B: Trending in Categories
โข User's top 3 categories
โข High engagement rate (>5%)
โข Posted in last 48 hours
โข Marked as trending (pre-computed)
โ Returns ~20-30 posts
Query C: Local Sellers
โข Same city/region as user
โข Active posts (last 7 days)
โข Not seen by user
โ Returns ~10-20 posts
Query D: Sponsored
โข Active campaigns
โข Targeted to user's interests
โข Budget remaining
โ Returns ~5-10 posts
Total Candidates: ~80-120 posts
Step 3: Filtering (< 50ms)
Remove:
โข Already seen by user (Redis lookup)
โข Out of stock products
โข Blocked/reported sellers
โข Duplicate products
โข Posts from blocked users
Remaining: ~60-100 posts
Step 4: Scoring Each Post (< 300ms)
For each post, calculate score (0-100):
A. Social Score (35 points max):
โโโ Following seller? +15
โโโ Friends engaged? +2 per friend (max 10)
โโโ Seller verified? +5
โโโ Seller rating: +0 to 5 (based on stars)
B. Engagement Prediction (25 points):
โโโ Category match? +10
โโโ Price in range? +8
โโโ Similar to past likes? +7
C. Personalization (20 points):
โโโ Matches search keywords? +2 each (max 8)
โโโ Local seller? +7
โโโ Language/preferences? +5
D. Recency (15 points):
โโโ <1 hour: +15
โโโ <6 hours: +12
โโโ <24 hours: +8
โโโ <3 days: +4
โโโ Older: +1
E. Quality (5 points):
โโโ 3+ images? +2
โโโ Detailed description? +1
โโโ High engagement rate? +2
Example Post Score:
โข Social: 15 (following) + 4 (2 friends) + 5 (verified) + 4.5 (rating) = 28.5
โข Engagement: 10 (category) + 8 (price) + 7 (similar) = 25
โข Personalization: 4 (keywords) + 7 (local) + 5 (prefs) = 16
โข Recency: 12 (<6 hours) = 12
โข Quality: 2 (images) + 1 (description) + 2 (engagement) = 5
โ Total: 86.5 points โญโญโญ
Step 5: Ranking & Diversity (< 100ms)
Primary Sort: By score (highest first)
Apply Diversity Rules:
1. No more than 3 consecutive posts from same seller
If post N and N+1 and N+2 are from Seller A:
โ Move post N+2 down
2. Category distribution in top 10
Count categories in positions 1-10
If one category > 5 posts:
โ Demote some, promote others
3. Price variance
Check price distribution in top 20
If all high-priced or all low-priced:
โ Inject middle-price items
4. Content type mix
Aim for: 70% product posts, 20% reviews, 10% tutorials
Adjust positions to achieve balance
5. Freshness boost (random)
Randomly boost 2-3 recent high-quality posts (score >70)
Prevents feed from being too predictable
6. Sponsored insertion
Insert 1 sponsored post at position 7-10
Feels native, not intrusive
Step 6: Pagination & Response (< 50ms)
Take top 20 posts from ranked list
Generate cursor:
โข cursor = ID of 20th post
โข Next request uses: ?cursor=12345&limit=20
โข Returns posts with ID < 12345
Mark as seen:
โข Add post IDs to Redis set: user:{id}:seen_posts
โข TTL: 30 days
โข Prevents showing same posts again
Track impressions:
โข Log to analytics: user viewed these posts
โข Used for engagement metrics
โข Improve future recommendations
Response format:
{
"posts": [...20 posts with full data...],
"nextCursor": "12345",
"hasMore": true
}
Total Time: ~750ms (well under 1 second target!)
Feed Diversity Strategies
Why Diversity Matters:
Without diversity:
โข All posts from one seller (boring!)
โข All same category (no discovery)
โข All same price range (limits audience)
โข Predictable feed (user stops scrolling)
With diversity:
โข Varied content (keeps interest)
โข Category discovery (impulse buys)
โข Price options (something for everyone)
โข Surprising finds (engagement boost)
Diversity Techniques:
1. Temporal Diversity (Avoid Staleness)
Problem: Old high-scored posts dominate
Solution: Recency Decay
โข Posts >3 days old: Reduce score by 30%
โข Posts >7 days old: Reduce score by 60%
โข Forces fresh content into feed
Balance:
โข Still show quality old posts if very high score
โข But prioritize newer content generally
2. Seller Diversity (Avoid Spam Feeling)
Problem: Popular seller floods feed
Solution: Consecutive Limit
โข Track last 3 posts shown
โข If post N, N+1, N+2 from same seller:
โข Demote post N+2 by 20 positions
โข Gives other sellers chance
Exception:
โข User explicitly follows only one seller
โข Then okay to show more from that seller
3. Category Diversity (Enable Discovery)
Problem: User pigeonholed into one category
Solution: Category Quotas in Top 20
โข Max 8 posts from dominant category
โข Min 2 posts from each of user's top 3 categories
โข 2-4 posts from exploratory categories
Example:
User loves "Electronics" (dominant)
Also views "Fashion", "Home"
Feed includes:
โข 7 Electronics posts
โข 4 Fashion posts
โข 3 Home posts
โข 3 Sports posts (exploratory)
โข 3 Beauty posts (exploratory)
4. Price Diversity (Cater to Moods)
Problem: All luxury or all budget items
Solution: Price Distribution
Target in top 20 posts:
โข 30% Budget (<$50)
โข 40% Mid-range ($50-200)
โข 30% Premium (>$200)
Rationale:
โข Some users browsing casually (show budget)
โข Some ready to buy (show mid-range)
โข Some aspirational shopping (show premium)
โข Different moods, different needs
5. Content Type Diversity
Mix post types:
โข 70% Direct product posts (main content)
โข 15% Review/testimonial posts (social proof)
โข 10% Tutorial/how-to posts (educational value)
โข 5% Behind-the-scenes (brand building)
Prevents feed from being pure sales pitch
Provides value beyond "buy now"
Trending Algorithm
What Makes Content "Trending"?
Criteria:
1. High Engagement Rate
Likes + Comments + Shares รท Views > 10%
2. Velocity (Speed of engagement)
Gaining engagement faster than average
3. Recency
Posted within last 48 hours
4. Social Spread
Engaged by users from different networks
(not just one seller's followers)
Trending Score Formula:
Trending Score =
(0.40 ร Engagement Rate) +
(0.30 ร Velocity Score) +
(0.20 ร Reach Score) +
(0.10 ร Recency Boost)
Where:
Engagement Rate = (Likes + Comments ร 3 + Shares ร 5) / Views
Velocity Score = Current hourly rate รท Average hourly rate
Reach Score = Unique user networks engaged / Total networks
Recency Boost = 100 if <6 hours, 80 if <24 hours, 60 if <48 hours
Example:
Post:
โข 1000 views, 80 likes, 20 comments, 10 shares
โข Posted 4 hours ago
โข Gaining 15 engagements/hour (average: 5/hour)
โข Engaged by users from 8 different networks
Engagement Rate = (80 + 20ร3 + 10ร5) / 1000 = 190/1000 = 0.19 = 19%
Velocity = 15/5 = 3.0 (3x faster than average)
Reach = 8/20 active networks = 0.40 = 40%
Recency = 100 (<6 hours)
Trending Score = (0.40ร19) + (0.30ร3ร100/3) + (0.20ร40) + (0.10ร100)
= 7.6 + 30 + 8 + 10
= 55.6 โ Mark as trending!
Trending Update Frequency:
Background job runs every 5 minutes:
1. Query posts from last 48 hours
2. Calculate trending scores
3. Mark posts with score >50 as trending
4. Store in Redis (key: trending_posts)
5. Feed service reads from this cache
Why every 5 minutes?
โข Balance between freshness and server load
โข Trending changes relatively slowly (minutes, not seconds)
โข Users won't notice 5-minute delay
๐ Search System (Complete Flow)
Text Search Process
User Types Query โ Get Results
REQUEST:
GET /api/search?q=comfortable+running+shoes&limit=20
BACKEND PROCESS:
Step 1: Query Processing (< 50ms)
Input: "comfortable running shoes"
Clean & Normalize:
โข Lowercase: "comfortable running shoes"
โข Remove special chars
โข Trim whitespace
โข Tokenize: ["comfortable", "running", "shoes"]
Check for special patterns:
โข Price filter: "under $100"
โข Color: "red", "blue"
โข Brand: "nike", "adidas"
โข Size: "size 10", "large"
Extract filters if present:
query = "comfortable running shoes"
filters = {price_max: 100, category: "footwear"}
Step 2: Generate Query Embedding (< 100ms)
Call Python Embedding Service:
POST http://embedding-service:5000/embed-text
Body: {"text": "comfortable running shoes"}
Python processes:
โข Sentence Transformers model loaded in memory
โข Generates embedding: [0.45, 0.78, 0.23, ... 384 numbers]
Returns: {"embedding": [0.45, 0.78, ...], "dimensions": 384}
Spring Boot receives embedding
Step 3: Similarity Search (< 200ms)
PostgreSQL query with pgvector:
Find products where:
โข text_embedding similar to query embedding (cosine similarity)
โข Apply filters (price, category, etc.)
โข In stock
โข Not blocked
Using pgvector index (ivfflat):
โข Fast approximate nearest neighbor search
โข Returns top 100 candidates with similarity scores
Step 4: Ranking & Boosting (< 100ms)
Base Ranking: Similarity score (0-1)
Apply Boosts:
โข In stock: +10% score
โข Popular (high sales): +8% score
โข New arrival (<30 days): +5% score
โข High rated (4.5+ stars): +5% score
โข Exact keyword match in name: +15% score
Apply User Personalization:
โข User's favorite brands: +10% score
โข User's typical price range: +8% score
โข User previously viewed: +5% score
Penalties:
โข Low stock warning: -5% score
โข Low rating (<3.5 stars): -10% score
โข No image: -8% score
Final Sort: By boosted score
Step 5: Response (< 50ms)
Take top 20 results
Format response:
{
"query": "comfortable running shoes",
"results": [
{
"id": 123,
"name": "Nike Air Max Running Shoes",
"price": 89.99,
"similarity": 0.92,
"rating": 4.7,
...
},
... 19 more
],
"total": 847,
"took": "287ms"
}
Total Time: ~500ms (good search experience!)
Visual Search Process
User Uploads Image โ Get Visually Similar Products
REQUEST:
POST /api/search/visual
Content-Type: multipart/form-data
Body: image file
BACKEND PROCESS:
Step 1: Image Reception & Validation (< 100ms)
Receive uploaded file
Validate:
โข File size < 10MB
โข Format: JPG, PNG, WEBP
โข Not corrupted
Preprocess:
โข Resize if needed (max 1024x1024)
โข Convert to RGB if grayscale
โข Optimize file size
Temporarily store:
โข Option A: Memory (for immediate processing)
โข Option B: Temp file system (for large images)
โข Option C: S3 (for tracking/analytics)
Step 2: Generate Image Embedding (< 300ms)
Convert image to bytes
Call Python Embedding Service:
POST http://embedding-service:5000/embed-image-bytes
Body: {"image_base64": "iVBORw0KGgo..."}
Python processes:
โข CLIP model loaded in memory
โข Processes image pixels (not filename!)
โข Understands visual features: color, shape, style
โข Generates embedding: [0.91, 0.23, 0.67, ... 512 numbers]
Returns: {"embedding": [0.91, 0.23, ...], "dimensions": 512}
Spring Boot receives embedding
Step 3: Visual Similarity Search (< 200ms)
PostgreSQL query with pgvector:
Find products where:
โข image_embedding similar to upload embedding
โข In stock
โข Has image (obviously)
Using pgvector index on image_embedding:
โข Fast ANN search
โข Returns top 50 candidates with visual similarity scores
Step 4: Ranking & Filtering (< 100ms)
Base Ranking: Visual similarity (0-1)
Filter Options (if user specifies):
โข Category filter: "Show only shoes"
โข Price range: "$50-150"
โข Brand preference
Apply Boosts:
โข Popular in category: +10%
โข High quality images: +5%
โข Multiple product images: +5%
Sort by final score
Step 5: Response (< 50ms)
Return top 20 visually similar products
Response format:
{
"results": [
{
"id": 456,
"name": "Similar Red Sneakers",
"image_url": "...",
"similarity": 0.88,
"price": 79.99
},
... 19 more
],
"message": "Found 20 visually similar products"
}
Total Time: ~750ms-1s (acceptable for visual search!)
Search Enhancements
Query Understanding (NLP Techniques):
Intent Detection:
"gift for dad" โ Gift intent
โ Boost: Gift-appropriate products
โ Filter: Hide inappropriate items
"cheap laptop" โ Budget intent
โ Sort: Price low to high
โ Filter: <$500
"best running shoes" โ Quality intent
โ Sort: Rating high to low
โ Boost: Reviews, testimonials
"red nike shoes size 10" โ Specific intent
โ Exact filters applied
โ Precise matching
Spell Correction:
User types: "iphone 15 pro mac"
System detects: "mac" likely typo
Suggestion: "Did you mean: iphone 15 pro max?"
Auto-correct for common typos:
โข "iphone" โ "iPhone"
โข "macbook" โ "MacBook"
โข "airpods" โ "AirPods"
Search Suggestions (As User Types):
User types: "run"
Suggestions appear:
โข "running shoes" (popular)
โข "running shorts" (trending)
โข "running watch" (category)
Based on:
โข Popular searches (last 7 days)
โข User's past searches
โข Trending products
โข Category completions
After search results, suggest:
"People also searched for:"
โข "nike running shoes" (brand specific)
โข "trail running shoes" (category variant)
โข "running shoes for women" (gender variant)
Generated from:
โข Search session data (what others searched next)
โข Similar query embeddings
โข Category relationships
๐ ๏ธ Technology Stack
Backend (Spring Boot)
Core Framework:
- Spring Boot 3.2+
- Java 17 or 21
- Spring Web (REST APIs)
- Spring Data JPA (Database)
- Spring Security (Authentication)
- Spring WebSocket (Real-time features)
Libraries & Dependencies:
- Lombok (reduce boilerplate)
- MapStruct (object mapping)
- Hibernate (ORM)
- Jackson (JSON processing)
- Resilience4j (circuit breaker, retry)
- Micrometer (metrics)
Database
Primary Database:
- PostgreSQL 16+
- pgvector extension (for vector similarity)
Caching:
- Redis 7+
- Used for: sessions, feed cache, seen posts, hot data
Embedding Service (Python)
Framework:
- Flask or FastAPI
- Python 3.10+
ML Libraries:
- Sentence Transformers (text embeddings)
- Transformers (Hugging Face)
- CLIP (image embeddings)
- PyTorch or TensorFlow
- NumPy, Pandas
Models:
- Text: all-MiniLM-L6-v2 (384 dimensions)
- Image: openai/clip-vit-base-patch32 (512 dimensions)
Infrastructure
Containerization:
- Docker
- Docker Compose (local development)
Orchestration (Production):
- Kubernetes (if scaling large)
- OR Docker Swarm (simpler alternative)
- OR Cloud services (AWS ECS, Google Cloud Run)
Cloud Services (Optional):
- AWS: S3 (images), RDS (database), ElastiCache (Redis)
- Google Cloud: Cloud Storage, Cloud SQL, Memorystore
- Azure: Blob Storage, Azure Database, Azure Cache
Monitoring & Observability
Metrics:
- Prometheus (metrics collection)
- Grafana (visualization)
- Spring Boot Actuator (health endpoints)
Logging:
- Logback or Log4j2
- ELK Stack (Elasticsearch, Logstash, Kibana) optional
- OR Cloud logging (CloudWatch, Stackdriver)
Tracing:
- Spring Cloud Sleuth + Zipkin (optional)
- Jaeger (distributed tracing)
๐ Implementation Phases
Phase 1: MVP - Basic Recommendations (Weeks 1-3)
Goal: Launch with working feed and basic recommendations
Features to Implement:
- User can create posts about products
- Follow/unfollow sellers
- Like, comment, share posts
- Basic feed: Posts from following + trending
- Simple scoring (social + recency only)
- Cursor-based pagination
โ Basic Product Similarity
- "Similar Products" using category + price range
- No embeddings yet (too complex for MVP)
- Collaborative filtering: "Bought together"
โ Simple Search
- Keyword-based search
- Filter by category, price
- Sort by relevance, price, rating
- No semantic search yet
โ Engagement Tracking
- Track views, likes, comments, shares
- Store user activity (views, searches)
- Basic analytics
โ Group Buy (Core)
- Create group purchase
- Join existing group
- Group countdown timer
- Basic "find groups" (category match)
โ Installment Display
- Show monthly payment option
- Simple affordability check
- Payment plan calculator
No Embeddings in Phase 1!
- Focus on core functionality
- Prove product-market fit
- Gather user data for Phase 2
Success Metrics:
- Feed engagement rate > 3%
- User retention (day 7) > 30%
- Group buy completion rate > 50%
Phase 2: Enhanced Recommendations (Weeks 4-6)
Goal: Add semantic search and better personalization
Features to Implement:
โ Text Embeddings
- Set up Python embedding service
- Generate text embeddings for all products
- Update products when created/edited
โ Semantic Search
- Search using text embeddings
- Understand query meaning
- Better search relevance
โ Improved Product Similarity
- Use text embeddings
- Semantic similarity matching
- Better "Similar Products" section
โ User Interest Profiling
- Track user's category preferences
- Calculate interest scores
- Use in feed personalization
โ Enhanced Feed Algorithm
- Add personalization score
- Category matching
- Price range matching
- Engagement prediction
โ Group Buy Intelligence
- Match users to relevant groups using embeddings
- Smart group invitations (social graph)
- Urgency notifications
- "Complete the group" campaigns
โ Installment Recommendations
- Budget analysis per user
- Premium product suggestions
- Upgrade recommendations
- Bundle suggestions
Success Metrics:
- Search click-through rate > 8%
- "Similar products" click rate > 12%
- Feed engagement rate > 5%
- Group buy completion rate > 65%
Phase 3: Visual Search & Advanced Features (Weeks 7-9)
Goal: Add image-based features and optimize
Features to Implement:
โ Image Embeddings
- Generate image embeddings for products
- Store in database
- Index for fast search
โ Visual Search
- Upload image to find products
- Image similarity matching
- Visual style recommendations
โ Shop the Look
- Identify items in images
- Match to catalog
- Complete outfit suggestions
โ Combined Search
- Text + Image combined queries
- "Find red shoes like this image"
- Cross-modal search (CLIP)
โ Feed Optimization
- Diversity rules
- Quality boosting
- Trending algorithm refinement
โ Caching Strategy
- Redis caching for hot data
- Pre-computed similarities
- Query caching
โ A/B Testing Framework
- Test different scoring weights
- Test feed compositions
- Measure impact on conversions
Success Metrics:
- Visual search adoption > 15% of searches
- Feed engagement rate > 7%
- Conversion rate > 3%
- Average session time > 12 minutes
Phase 4: Optimization & ML (Weeks 10+)
Goal: Fine-tune and scale
Features to Implement:
โ Performance Optimization
- Query optimization
- Index tuning
- Embedding search speed improvements
- Caching refinements
โ Advanced Personalization
- ML-based engagement prediction
- Click-through rate modeling
- Purchase probability scoring
โ Recommendation Quality
- Feedback loops (did user buy?)
- Continuously improve scoring weights
- Seasonal adjustments
โ Scaling
- Load testing
- Database replication
- Caching layers
- CDN for images
โ Analytics Dashboard
- Recommendation performance metrics
- A/B test results
- User behavior insights
- Business intelligence
Success Metrics:
- System response time < 500ms (p95)
- Handle 10,000+ concurrent users
- Recommendation click rate > 15%
- Overall conversion rate > 5%
โก Performance & Scaling
Performance Targets
API Response Times (95th percentile):
Feed API: < 500ms
Search API: < 600ms
Visual Search: < 1.5s
Product Detail: < 200ms
Group Buy List: < 300ms
Database Query Times:
Simple queries: < 50ms
Vector similarity (with index): < 200ms
Complex joins: < 300ms
Caching Hit Rates:
User following list: > 90%
Trending posts: > 95%
Product embeddings: > 80%
Search query results: > 60%
Optimization Strategies
1. Database Optimization
Indexing Strategy:
โข Primary keys (automatic)
โข Foreign keys (user_id, product_id, etc.)
โข Frequently filtered columns (category, price, created_at)
โข Vector columns (pgvector ivfflat index)
โข Composite indexes for common queries
Query Optimization:
โข Use EXPLAIN ANALYZE to identify slow queries
โข Avoid N+1 queries (use JOIN or batch fetch)
โข Limit result sets appropriately
โข Use covering indexes where possible
Connection Pooling:
โข HikariCP (default in Spring Boot)
โข Pool size: 10-20 connections per instance
โข Connection timeout: 30 seconds
โข Idle timeout: 10 minutes
2. Caching Strategy
Redis Cache Layers:
Layer 1: Hot Data (TTL: 5-15 minutes)
โข Trending posts
โข Active group buys
โข Real-time counters
Layer 2: Warm Data (TTL: 1-6 hours)
โข User following lists
โข User interest profiles
โข Popular products
Layer 3: Cold Data (TTL: 24 hours)
โข Similar product lists (pre-computed)
โข Static content
โข Configuration data
Layer 4: Session Data (TTL: 7-30 days)
โข User seen posts
โข Shopping cart
โข Search history
Cache Invalidation:
โข Update cache when data changes
โข Lazy invalidation (TTL expiry)
โข Active invalidation (delete on update)
3. Embedding Service Optimization
Model Loading:
โข Load models at startup (not per request)
โข Keep in memory (RAM)
โข Use GPU if available (5-10x faster)
Batch Processing:
โข Process multiple embeddings together
โข Amortize model overhead
โข 2-3x throughput improvement
Request Queuing:
โข Queue embedding requests
โข Batch process every 100ms
โข Balance latency vs throughput
Caching:
โข Cache common query embeddings
โข Cache product embeddings in Redis
โข Reduce Python service calls
4. Feed Generation Optimization
Pre-computation:
โข Background job: Calculate trending posts (every 5 min)
โข Background job: Update user interests (daily)
โข Background job: Pre-compute popular similarities
Parallel Processing:
โข Fetch candidates in parallel (following + trending + local)
โข Score posts in parallel (if list is large)
โข Use CompletableFuture or reactive programming
Result Caching:
โข Cache full feed for 1-2 minutes (same user, same request)
โข Cache candidate lists for 5 minutes
โข Invalidate on user action (post, like, follow)
5. Search Optimization
Index Strategy:
โข pgvector ivfflat index on embeddings
โข Full-text search index (if needed)
โข Composite indexes on filters
Query Rewriting:
โข Combine filters in single query
โข Use covering indexes
โข Avoid sequential scans
Result Caching:
โข Cache popular search queries
โข TTL: 10-30 minutes
โข Personalized results: 2-5 minutes
Scaling Strategy
Horizontal Scaling (Add More Servers):
Application Layer (Spring Boot):
โข Stateless design (session in Redis, not memory)
โข Load balancer distributes requests
โข Scale out to 5, 10, 20+ instances
โข Auto-scaling based on CPU/memory
Embedding Service (Python):
โข Run multiple instances
โข Load balancer in front
โข Each instance has model loaded
โข Scale based on request rate
Database (PostgreSQL):
โข Read replicas for heavy read load
โข Write to master, read from replicas
โข Connection pooling per instance
โข Separate analytics queries to replica
Vertical Scaling (Bigger Servers):
When to Scale Up:
โข Database: More RAM for cache, faster CPU for queries
โข Redis: More RAM for bigger cache
โข Embedding service: GPU for faster inference
Sweet Spot:
โข Application: 4-8 CPU, 8-16 GB RAM
โข Database: 8-16 CPU, 32-64 GB RAM
โข Redis: 4 CPU, 16-32 GB RAM
โข Embedding: 4-8 CPU, 16-32 GB RAM, GPU optional
Caching Layers:
Level 1: Application Cache (Spring @Cacheable)
โข Short-lived data (seconds to minutes)
โข User session data
โข Request-scoped cache
Level 2: Redis Cache
โข Medium-lived data (minutes to hours)
โข Cross-instance sharing
โข High performance
Level 3: CDN (Images, Static Assets)
โข Long-lived data (hours to days)
โข Globally distributed
โข Reduce origin load
Level 4: Database Query Cache
โข PostgreSQL query result cache
โข Automatic by database
โข Benefit from repeated queries
Database Partitioning (Future):
When Needed:
โข >10 million products
โข >100 million posts
โข Query performance degrades
Partitioning Strategy:
โข Range partition by date (posts)
โข Hash partition by user_id (user data)
โข List partition by category (products)
Sharding (Advanced):
โข Separate database per region
โข Shard by user_id hash
โข Cross-shard queries expensive (avoid)
๐ Metrics & Monitoring
Key Performance Indicators (KPIs)
User Engagement:
Feed Metrics:
โข Feed Engagement Rate: (Likes + Comments + Shares) / Views
Target: >5%
โข Scroll Depth: Average posts viewed per session
Target: >30 posts
โข Session Duration: Time spent in app
Target: >10 minutes
โข Return Rate: % users returning within 7 days
Target: >40%
Conversion Metrics:
Purchase Funnel:
โข Feed View โ Product Page: 8-12%
โข Product Page โ Add to Cart: 15-20%
โข Cart โ Purchase: 30-40%
โข Overall Conversion: 3-5%
Group Buy:
โข Group View โ Join: 25-35%
โข Group Completion Rate: 70-80%
โข Average Group Fill Time: < 48 hours
Installment:
โข Installment Shown โ Selected: 40-50%
โข Premium Product with Installment: 20-25% conversion boost
Recommendation Quality:
Click-Through Rates:
โข "Similar Products": 12-18%
โข Search Results: 8-12%
โข Feed Recommendations: 5-8%
โข Group Buy Suggestions: 20-30%
Relevance Metrics:
โข Precision@10: >60% (of top 10 results are relevant)
โข Mean Reciprocal Rank: >0.7 (relevant result in top 3)
โข User Satisfaction: Survey score >4/5
System Performance:
Response Times (p95):
โข Feed Load: < 500ms
โข Search: < 600ms
โข Product Page: < 200ms
โข Visual Search: < 1.5s
Availability:
โข Uptime: >99.9% (less than 45 min downtime/month)
โข Error Rate: <0.1%
โข Successful API calls: >99.5%
Resource Utilization:
โข CPU: 60-80% (headroom for spikes)
โข Memory: 70-85%
โข Database connections: 50-70% of pool
Monitoring Setup
Application Monitoring:
Spring Boot Actuator:
โข /actuator/health: System health
โข /actuator/metrics: Performance metrics
โข /actuator/prometheus: Prometheus export
Custom Metrics:
โข Feed generation time
โข Embedding generation time
โข Similarity search time
โข Cache hit rates
โข Recommendation click-through rates
Database Monitoring:
PostgreSQL Metrics:
โข Query execution time
โข Index usage
โข Table sizes
โข Connection pool status
โข Slow query log
pgvector Specific:
โข Vector search time
โข Index efficiency
โข Embedding storage size
Cache Monitoring:
Redis Metrics:
โข Hit rate
โข Memory usage
โข Eviction rate
โข Command execution time
โข Key expiration rate
Business Metrics Dashboard:
Real-Time:
โข Active users now
โข Feeds served (per minute)
โข Searches executed
โข Group buys active
โข Purchases completed
Daily:
โข New users
โข Daily active users (DAU)
โข Group buy completion rate
โข Average order value
โข Conversion rate
Weekly/Monthly:
โข User retention (cohort analysis)
โข Revenue trends
โข Top categories
โข Best performing recommendations
โข A/B test results
๐ฏ Success Criteria
Phase 1 (MVP) Success:
- โ Feed loads in < 1 second
- โ Users scroll through 20+ posts per session
- โ Group buy completion rate > 50%
- โ Basic recommendations work (not perfect)
- โ System handles 1,000 concurrent users
Phase 2 (Enhanced) Success:
- โ Search relevance improved (user feedback positive)
- โ "Similar products" click rate > 12%
- โ Feed engagement rate > 5%
- โ Group buy completion rate > 65%
- โ Installment adoption > 30% for expensive items
Phase 3 (Advanced) Success:
- โ Visual search works well (user satisfaction > 4/5)
- โ Feed engagement rate > 7%
- โ Overall conversion rate > 3%
- โ System handles 10,000+ concurrent users
- โ Recommendation quality high (precision > 60%)
Long-Term Success:
- โ Platform is primary discovery channel (not search)
- โ Users return 3+ times per week
- โ Average session > 15 minutes
- โ Conversion rate > 5%
- โ User satisfaction > 4.5/5
- โ System scales to 1M+ users
๐ Final Notes
Critical Principles
1. Start Simple, Scale Smart:
- Phase 1: No embeddings, basic scoring
- Phase 2: Add text embeddings
- Phase 3: Add image embeddings
- Don't over-engineer from day 1
2. User Data is Gold:
- Every interaction teaches the system
- Track everything (respecting privacy)
- Use data to improve recommendations
- Feedback loops are essential
- "7 people in group" > "Save $150"
- "Your friend bought this" > "Trending"
- Community drives engagement
- Leverage network effects
4. Affordability = Access:
- Group buying makes expensive affordable
- Installments remove price barriers
- Together = 2-3x conversion boost
- Key differentiator for Nexgate
5. Feed > Search:
- Most users browse, not search
- Discovery drives impulse buys
- Feed is primary experience
- Search is support feature
6. Diversity Matters:
- Avoid echo chambers
- Mix categories, prices, sellers
- Enable serendipitous discovery
- Balance personalization with exploration
7. Performance is Feature:
- Fast feed = more scrolling
- Slow search = user leaves
- Every 100ms matters
- Optimize relentlessly
When to Use What
Embeddings:
- Product similarity
- Semantic search
- Visual search
- Style matching
Scoring System:
- Main feed
- Personalization
- Trending
- Urgency ranking
- Who to follow
- Group invitations
- Friend activity
- Network effects
Collaborative Filtering:
- "Bought together"
- "Also viewed"
- Purchase patterns
- Cross-sell
All Together:
- Best results come from hybrid approach
- Combine multiple signals
- Weight based on context
- Continuous improvement
๐ Conclusion
This architecture document provides a complete blueprint for Nexgate's recommendation system. It combines:
The system is designed to:
- Start simple (MVP with basic features)
- Scale smart (add complexity as needed)
- Learn continuously (from user behavior)
- Prioritize performance (fast = engaging)
- Drive conversions (discovery โ purchase)
Remember: The best recommendation system is one that:
- Serves users first (not algorithms)
- Balances personalization with discovery
- Performs fast and reliably
- Improves over time with data
- Drives business goals
Good luck building Nexgate! ๐
Document Version: 1.0
Last Updated: November 2025
Next Review: After Phase 1 completion
Understanding Recommendation Systems - From Zero to Hero ๐
๐ฏ What You'll Learn
This guide explains recommendation systems from first principles, with real-world examples, formulas, and the math behind them. No code, just concepts!
๐ Chapter 1: What Are Recommendation Systems?
The Simple Definition
A recommendation system is a tool that predicts what you might like based on:
- What you've done before
- What others like you have done
- Properties of the items themselves
Real-World Analogy
Imagine a smart bookstore clerk:
- Remembers every book you bought
- Knows what other customers bought
- Understands book genres and themes
- Suggests books you'll probably enjoy
That's essentially what a recommendation system does!
๐๏ธ Chapter 2: The Three Main Types
Type 1: Content-Based Filtering
Concept: Recommend items similar to what you liked before.
How it works:
- Analyze features of items you liked
- Find other items with similar features
- Recommend those items
Example:
You liked:
- "Harry Potter" (Fantasy, Magic, Young Adult, Adventure)
- "Lord of the Rings" (Fantasy, Magic, Epic, Adventure)
System recommends:
- "The Hobbit" (Fantasy, Magic, Adventure) โ
Very similar!
- "Chronicles of Narnia" (Fantasy, Magic, Young Adult) โ
Good match!
The Math Behind It:
Each item is represented as a feature vector:
Harry Potter = [Fantasy: 1, Magic: 1, Young Adult: 1, Adventure: 1, Romance: 0]
Lord of the Rings = [Fantasy: 1, Magic: 1, Young Adult: 0, Adventure: 1, Romance: 0]
The Hobbit = [Fantasy: 1, Magic: 1, Young Adult: 0, Adventure: 1, Romance: 0]
Similarity Calculation (Cosine Similarity):
Similarity = (A ยท B) / (||A|| ร ||B||)
Where:
A ยท B = Dot product (multiply matching features)
||A|| = Magnitude of vector A
||B|| = Magnitude of vector B
Result: Number between 0 (totally different) and 1 (identical)
Pros:
- โ Doesn't need other users' data
- โ Can recommend new items immediately
- โ Easy to explain why something was recommended
Cons:
- โ Limited to features you can describe
- โ Can't discover new interests
- โ Gets stuck in a "filter bubble"
Type 2: Collaborative Filtering
Concept: "People like you also liked..."
How it works:
- Find users similar to you
- See what they liked
- Recommend those items to you
Example:
You (Alice):
- Liked: iPhone, MacBook, AirPods
- Rating: 5 stars, 5 stars, 4 stars
Similar User (Bob):
- Liked: iPhone, MacBook, AirPods, Apple Watch
- Rating: 5 stars, 5 stars, 5 stars, 5 stars
Recommendation for Alice:
โ Apple Watch (because Bob, who has similar taste, loves it!)
Two Approaches:
A. User-Based Collaborative Filtering
Formula for User Similarity (Pearson Correlation):
similarity(user_a, user_b) =
ฮฃ(rating_a - avg_a)(rating_b - avg_b)
/ โ[ฮฃ(rating_a - avg_a)ยฒ] ร โ[ฮฃ(rating_b - avg_b)ยฒ]
Result: Number between -1 (opposite taste) and 1 (identical taste)
Example Calculation:
Alice's ratings: [5, 4, 3, ?, 2]
Bob's ratings: [5, 5, 3, 4, 2]
Carol's ratings: [1, 2, 3, 4, 5]
Similarity(Alice, Bob) = 0.95 (very similar!)
Similarity(Alice, Carol) = -0.8 (opposite taste!)
Predict Alice's rating for item 4:
โ Use Bob's rating (4) because Bob is most similar
B. Item-Based Collaborative Filtering
Instead of finding similar users, find similar items!
Example:
People who bought iPhone also bought:
- iPhone Case (90% of buyers)
- Screen Protector (85% of buyers)
- AirPods (60% of buyers)
- Apple Watch (40% of buyers)
You bought iPhone โ Recommend iPhone Case (highest correlation!)
Formula for Item Similarity:
similarity(item_i, item_j) =
Number of users who liked both items
/ โ(Users who liked item_i ร Users who liked item_j)
This is called "Jaccard Similarity"
Pros:
- โ Discovers new interests
- โ Doesn't need item features
- โ Works well with lots of user data
Cons:
- โ Cold start problem (new users/items)
- โ Sparsity (most users rate few items)
- โ Popularity bias (recommends popular items)
Type 3: Hybrid Systems
Concept: Combine multiple approaches for better results!
Common Combinations:
A. Weighted Hybrid
Final Score =
(0.5 ร Content-Based Score) +
(0.5 ร Collaborative Score)
Example:
Product X:
- Content similarity to your likes: 0.8
- People like you also bought it: 0.6
- Final score: (0.5 ร 0.8) + (0.5 ร 0.6) = 0.7
B. Switching Hybrid
IF user is new (no history):
โ Use Content-Based (based on item features)
ELSE IF user has lots of history:
โ Use Collaborative (based on similar users)
C. Cascade Hybrid
Step 1: Content-Based filters 1000 โ 100 items
Step 2: Collaborative ranks those 100 โ Top 10
Step 3: Show top 10 to user
๐ Chapter 3: The Math Explained Simply
Similarity Measures
These are ways to measure "how alike" two things are.
1. Cosine Similarity (Most Common)
Imagine two arrows in space:
Arrow A points โ (3, 4)
Arrow B points โ (4, 3)
Angle between them = small โ Similar!
Angle = 90ยฐ โ Completely different
Formula:
cosine_similarity = cos(ฮธ) = (A ยท B) / (|A| ร |B|)
Where:
A ยท B = (3ร4) + (4ร3) = 12 + 12 = 24
|A| = โ(3ยฒ + 4ยฒ) = โ25 = 5
|B| = โ(4ยฒ + 3ยฒ) = โ25 = 5
Result = 24 / (5 ร 5) = 24/25 = 0.96 (very similar!)
Range: 0 (perpendicular) to 1 (identical direction)
2. Euclidean Distance
Think of it as "crow flies" distance:
Point A = (1, 2)
Point B = (4, 6)
Distance = โ[(4-1)ยฒ + (6-2)ยฒ]
= โ[9 + 16]
= โ25
= 5
Closer distance = More similar
Problem: Doesn't work well with different scales!
Price: $10 vs $15 (difference = 5)
Rating: 3 vs 4 stars (difference = 1)
The price difference dominates unfairly!
Solution: Normalize first (scale everything 0-1)
3. Pearson Correlation
Measures if two things move together:
Alice rates: [5, 4, 3, 2, 1]
Bob rates: [5, 4, 3, 2, 1]
โ Perfect correlation = 1.0 (they always agree!)
Alice rates: [5, 4, 3, 2, 1]
Carol rates: [1, 2, 3, 4, 5]
โ Perfect negative correlation = -1.0 (opposite taste!)
Formula:
r = ฮฃ[(x - xฬ)(y - ศณ)] / โ[ฮฃ(x - xฬ)ยฒ ร ฮฃ(y - ศณ)ยฒ]
Where:
xฬ = average of x
ศณ = average of y
Range: -1 (opposite) to +1 (identical)
Matrix Factorization (Advanced!)
The Idea: Break down the user-item matrix into hidden patterns.
Real-World Example:
Movie ratings matrix:
Action Comedy Drama
Alice 5 2 4
Bob 5 1 3
Carol 1 5 2
Hidden factors might be:
Factor 1: "Likes serious content"
Factor 2: "Likes funny content"
Alice = [High Factor 1, Low Factor 2] โ Likes Action/Drama
Carol = [Low Factor 1, High Factor 2] โ Likes Comedy
This is what Netflix does!
They discovered hidden factors like:
- "Likes quirky independent films"
- "Prefers big-budget blockbusters"
- "Enjoys thought-provoking documentaries"
Formula (Simplified):
Rating = User_Vector ยท Item_Vector
Alice's vector = [0.9, 0.2] (serious, not funny)
Action movie vector = [0.8, 0.1] (serious, not funny)
Predicted rating = (0.9 ร 0.8) + (0.2 ร 0.1)
= 0.72 + 0.02
= 0.74 (normalized)
โ 4.5 stars
๐ Chapter 4: Real-World Examples Explained
Example 1: Netflix
What they use: Hybrid system with heavy collaborative filtering + content-based
How it works:
Step 1: Collaborative Filtering
- Find users who rated movies similarly to you
- Weight: 60%
Step 2: Content-Based
- Analyze genres, actors, directors you like
- Weight: 25%
Step 3: Trending/Popular
- What's hot right now
- Weight: 15%
Final Score = (0.6 ร Collaborative) + (0.25 ร Content) + (0.15 ร Trending)
Why it works:
- Cold start: New users get recommendations based on genres they select
- Warm users: Get personalized recommendations from similar users
- Diversity: Trending ensures you see new popular content
Example 2: Amazon
What they use: Primarily item-based collaborative filtering
The Famous Algorithm: "Customers who bought X also bought Y"
How it's calculated:
iPhone โ Case: 85% co-purchase rate
iPhone โ Screen Protector: 78% co-purchase rate
iPhone โ Charger: 65% co-purchase rate
iPhone โ Laptop: 5% co-purchase rate
Formula:
Co-purchase rate =
(Times X and Y bought together) / (Times X was bought)
Example:
iPhone bought: 1000 times
iPhone + Case bought together: 850 times
Co-purchase rate = 850/1000 = 85%
Why it works:
- Very accurate for complementary products
- Doesn't need user profiles
- Works immediately for new users
- Based on actual purchase behavior (not just browsing)
Example 3: Spotify
What they use: Hybrid with collaborative + audio analysis + social
Three Recommendation Types:
A. Collaborative Filtering
Your playlists: [Pop, Rock, Indie]
Similar user's playlists: [Pop, Rock, Indie, Alternative]
โ Recommend Alternative music
B. Audio Analysis (Content-Based)
Song features analyzed:
- Tempo: 120 BPM
- Key: C Major
- Energy: High
- Valence (happiness): Medium
- Acousticness: Low
Find songs with similar audio features!
Your friends listen to:
- Artist X: 80% of friends
- Artist Y: 60% of friends
โ Recommend Artist X
Weekly Discover Playlist:
= 30% Collaborative (users like you)
+ 30% Audio similarity (songs like yours)
+ 20% New releases in your genres
+ 20% Social (what friends listen to)
Example 4: TikTok (The King!)
What they use: Engagement prediction model (ML-based)
How it works:
For each video, predict:
- Will user watch to the end? (Completion rate)
- Will user like it?
- Will user comment?
- Will user share?
- Will user follow creator?
Score =
(10 ร Completion prediction) +
(5 ร Like prediction) +
(8 ร Comment prediction) +
(12 ร Share prediction) +
(15 ร Follow prediction)
Show videos with highest predicted score!
Features considered:
Video features:
- Category/hashtags
- Music used
- Duration
- Captions
User features:
- Past liked categories
- Watch time patterns
- Engagement history
- Language preference
Interaction features:
- Time of day
- Device type
- Network speed
Why it's so addictive:
- Optimizes for ENGAGEMENT, not just relevance
- Learns quickly (every swipe teaches the algorithm)
- Heavy personalization (your feed is unique)
๐ Chapter 5: Common Formulas Reference
1. Weighted Score (Most Common in Practice!)
Final Score = ฮฃ(Weight_i ร Score_i)
Example (E-commerce):
Product Score =
(0.35 ร Social_Score) +
(0.25 ร Engagement_Score) +
(0.20 ร Personalization_Score) +
(0.15 ร Recency_Score) +
(0.05 ร Quality_Score)
Each component score is 0-100, normalized
2. Recency Decay
Recency Score = Base_Score ร e^(-ฮป ร time)
Where:
ฮป (lambda) = decay rate (how fast score decreases)
time = hours/days since creation
e = 2.71828 (natural logarithm base)
Example:
Base score = 100
ฮป = 0.1 (slow decay)
After 24 hours: 100 ร e^(-0.1 ร 24) = 100 ร 0.091 = 9.1
Interpretation: Old content gets much lower score
Simpler Alternative (Step Function):
IF age < 1 hour: Score = 100
ELSE IF age < 6 hours: Score = 80
ELSE IF age < 24 hours: Score = 50
ELSE IF age < 7 days: Score = 20
ELSE: Score = 5
3. Engagement Rate
Engagement Rate =
(Likes + Comments + Shares) / Views
Example:
Video: 10,000 views, 500 likes, 50 comments, 30 shares
Engagement = (500 + 50 + 30) / 10,000 = 0.058 = 5.8%
Good engagement: > 5%
Viral content: > 15%
4. Click-Through Rate (CTR)
CTR = Clicks / Impressions
Example:
Product shown 1000 times
Clicked 50 times
CTR = 50/1000 = 0.05 = 5%
Use CTR to rank items:
Higher CTR = Better recommendation
5. Conversion Rate
Conversion Rate = Purchases / Clicks
Example:
Product clicked 100 times
Purchased 10 times
Conversion = 10/100 = 10%
Ultimate metric: Did recommendation lead to action?
๐ฏ Chapter 6: Choosing the Right System
Decision Framework
Use Content-Based When:
- โ Items have rich descriptions
- โ Few users (cold start)
- โ Need to explain recommendations
- โ Items change frequently
Examples: News articles, blog posts, jobs
Use Collaborative Filtering When:
- โ Lots of user interaction data
- โ Items don't have clear features
- โ Want to discover unexpected items
- โ Users have diverse tastes
Examples: Movies, music, products
Use Hybrid When:
- โ You have both item features AND user data
- โ Want best of both worlds
- โ Can handle complexity
- โ Need to solve cold start
Examples: E-commerce (like Amazon), streaming (like Netflix)
- โ Platform has social connections
- โ Social proof matters
- โ Viral/trending important
- โ Community-driven
๐ Chapter 7: Learning Resources
Books (No Code!)
1. "Recommendation Systems: The Textbook" by Charu Aggarwal
- Comprehensive coverage
- Mathematical explanations
- Theory + Practice
- ๐ Best for deep understanding
2. "Practical Recommender Systems" by Kim Falk
- Real-world examples
- Less math, more intuition
- Case studies
- ๐ Best for beginners
3. "Programming Collective Intelligence" by Toby Segaran
- Intuitive explanations
- Simple examples
- Practical algorithms
- ๐ Best for implementation ideas
Online Courses
1. Coursera: "Recommender Systems" by University of Minnesota
- Free to audit
- Video lectures
- Covers all types
- ๐ Best structured course
2. YouTube: "StatQuest with Josh Starmer"
- Amazing explanations
- Visual animations
- Covers collaborative filtering, PCA, SVD
- ๐ฌ Best for visual learners
3. Google's Machine Learning Crash Course
- Section on recommendations
- Interactive examples
- Free and well-designed
- ๐ป Best for ML context
Papers (Foundational)
1. "Amazon.com Recommendations: Item-to-Item Collaborative Filtering"
- How Amazon does it
- Industry standard
- Very readable
- ๐ Must-read!
2. "The Netflix Prize" papers
- Competition that advanced the field
- Matrix factorization explained
- Real-world constraints
- ๐ Historical importance
3. "BPR: Bayesian Personalized Ranking"
- Modern ranking approach
- Implicit feedback (views, not ratings)
- Used by many companies
- ๐ Advanced but important
Websites
1. Towards Data Science (Medium)
- Blog posts explaining concepts
- Real-world case studies
- Beginner to advanced
- ๐ Free with email
2. Papers With Code
- Research papers + implementations
- See state-of-the-art methods
- Compare approaches
- ๐ Great for staying current
3. Google Research Blog
- How Google does recommendations
- YouTube algorithm explanations
- Cutting-edge research
- ๐ Straight from the source
๐งฎ Chapter 8: Working Example (No Code!)
Scenario: Recommend Products for Alice
Alice's History:
Bought: iPhone ($999), AirPods ($199), MacBook ($1299)
Viewed: iPad, Apple Watch, iPhone Case
Searched: "wireless earbuds", "laptop accessories"
Budget range: $150-1500
Available Products:
1. Apple Watch ($399)
2. iPad ($329)
3. Samsung Phone ($899)
4. Laptop Stand ($49)
5. Wireless Keyboard ($129)
6. iPhone Case ($29)
7. AirPods Pro ($249)
Method 1: Content-Based Scoring
Step 1: Define Item Features
Apple Watch:
- Brand: Apple (1)
- Category: Electronics (1)
- Price Range: Mid ($399 in her range โ
)
- Compatibility: iPhone (1)
Samsung Phone:
- Brand: Samsung (0 - she buys Apple)
- Category: Electronics (1)
- Price Range: High ($899 โ
)
- Compatibility: Android (0)
Step 2: Calculate Similarity
Apple Watch vs Alice's preferences:
Brand match: 100% (all Apple)
Category match: 100% (all electronics)
Price match: 80% (slightly lower than average)
Compatibility: 100% (has iPhone)
Similarity Score = (100 + 100 + 80 + 100) / 4 = 95%
Samsung Phone:
Brand match: 0%
Category match: 100%
Price match: 90%
Compatibility: 0%
Similarity Score = (0 + 100 + 90 + 0) / 4 = 47.5%
Ranking:
- Apple Watch (95%)
- AirPods Pro (92%)
- iPad (88%)
- Samsung Phone (47.5%)
Method 2: Collaborative Filtering
Step 1: Find Similar Users
Alice bought: [iPhone, AirPods, MacBook]
Bob bought: [iPhone, AirPods, MacBook, Apple Watch]
Similarity: 3/3 common items = 100% overlap!
Carol bought: [iPhone, Samsung Phone, Android Tablet]
Similarity: 1/3 common items = 33% overlap
Dan bought: [Dell Laptop, Android Phone]
Similarity: 0/3 common items = 0% overlap
Step 2: Recommend What Similar Users Bought
Bob (100% similar) also bought:
โ Apple Watch โ
Strong recommendation!
Carol (33% similar) also bought:
โ Samsung Phone โ Weak recommendation
Dan (0% similar):
โ Ignore his purchases
Ranking:
- Apple Watch (Bob recommends, 100% similarity)
- iPad (viewed but not bought - weaker signal)
Method 3: Hybrid Approach (Best!)
Combine Both Methods:
Apple Watch:
- Content similarity: 95%
- Collaborative: 100% (Bob bought it)
- Final: (0.5 ร 95) + (0.5 ร 100) = 97.5 โญ
iPad:
- Content similarity: 88%
- Collaborative: 50% (Alice viewed, no strong signal)
- Final: (0.5 ร 88) + (0.5 ร 50) = 69
Samsung Phone:
- Content similarity: 47.5%
- Collaborative: 33% (Carol bought, low similarity)
- Final: (0.5 ร 47.5) + (0.5 ร 33) = 40.25
Final Ranking:
- Apple Watch (97.5) โ Recommend this!
- AirPods Pro (92)
- iPad (69)
- Wireless Keyboard (55)
- Samsung Phone (40.25)
Adding More Factors
Recency Boost:
Apple Watch: Released 2 months ago โ +5 points
iPad: Released 6 months ago โ +3 points
Samsung Phone: Released 2 years ago โ +0 points
Updated scores:
1. Apple Watch (102.5)
2. AirPods Pro (92)
3. iPad (72)
Apple Watch: 4.8 stars, 10,000 reviews โ +8 points
iPad: 4.7 stars, 8,000 reviews โ +7 points
Samsung Phone: 4.5 stars, 5,000 reviews โ +5 points
Final scores:
1. Apple Watch (110.5) โญโญโญ
2. AirPods Pro (92)
3. iPad (79)
๐ก Key Takeaways
The Golden Rules
1. Simple Often Wins
- Don't need complex ML for good recommendations
- Weighted scoring can be 80% as effective
- Start simple, add complexity only if needed
2. Context Matters
3. Multiple Signals Are Better
- Combine content + collaborative + social + popularity
- No single method is perfect
- Hybrid approaches work best in practice
4. Measure What Matters
- Track engagement, conversion, retention
- A/B test different approaches
- Optimize for business goals, not just accuracy
5. Cold Start Is Hard
- New users: Use popular items + content-based
- New items: Use content-based + social proof
- Have fallback strategies
๐ฏ Summary Cheat Sheet
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Recommendation Method Picker โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Have item features? โ Content-Based
Have user behavior data? โ Collaborative
Have both? โ Hybrid โ
Social platform? โ Add social signals
Need explainability? โ Content-Based
Want serendipity? โ Collaborative
Cold start problem? โ Content-Based first,
then Collaborative
Popular approach: Weighted Hybrid
= (Weight ร Content) + (Weight ร Collab) +
(Weight ร Social) + (Weight ร Recency)
You now understand recommendation systems from first principles! ๐
Next steps:
- Re-read sections that were unclear
- Draw diagrams to visualize concepts
- Work through more examples on paper
- Apply to your Nexgate platform design
Remember: The best recommendation system is one that works for YOUR specific use case and users! ๐