Embeddings
Overview
Architecture
High-Level Flow
vCon Content → Embedding API → Vector (array of floats) → PostgreSQL (pgvector) → Similarity SearchComponents
Step 1: Enable pgvector Extension
In Supabase Dashboard
Verify Installation
Step 2: Add Vector Columns to Schema
Add Embedding Columns
Alternative: Dedicated Embeddings Table
Step 3: Create Vector Indexes
HNSW Index (Recommended for Most Cases)
IVFFlat Index (For Large Datasets)
Step 4: Generate Embeddings
Option A: Using OpenAI API
Option B: Using Sentence Transformers (Local/Self-Hosted)
Option C: Batch Processing with Edge Functions (Preferred in this repo)
Step 5: Semantic Search Queries
Basic Cosine Similarity Search
Similarity Operators in pgvector
Python Implementation
Step 6: Hybrid Search (Semantic + Exact)
SQL Function for Hybrid Search with Tags-from-Attachments
Python Implementation
Step 7: Automatic Embedding Generation
Trigger for Automatic Embedding on Insert
Background Job for Batch Embedding
Step 8: Performance Optimization
Query Optimization
Embedding Dimension Reduction
Caching Strategy
Step 9: Monitoring & Maintenance
Track Embedding Coverage
Monitor Search Performance
Index Maintenance
Cost Considerations
OpenAI Embedding Costs
Self-Hosted Alternative
Summary
Key Decisions
Implementation Checklist
Last updated