Embeddings
Overview
This guide explains how to implement semantic search for vCon conversation content in Supabase PostgreSQL using the pgvector extension for vector similarity search.
Architecture
High-Level Flow
vCon Content → Embedding API → Vector (array of floats) → PostgreSQL (pgvector) → Similarity Search
Components
pgvector Extension - PostgreSQL extension for vector similarity
Embedding Service - OpenAI, Sentence Transformers, or custom
Vector Storage - New columns in Supabase tables
Similarity Functions - Cosine similarity, L2 distance, inner product
Indexes - HNSW or IVFFlat indexes for fast retrieval
Step 1: Enable pgvector Extension
In Supabase Dashboard
-- Enable the pgvector extension
CREATE EXTENSION IF NOT EXISTS vector;
Verify Installation
SELECT * FROM pg_extension WHERE extname = 'vector';
Step 2: Add Vector Columns to Schema
Add Embedding Columns
-- Add vector columns to existing tables
ALTER TABLE vcons
ADD COLUMN subject_embedding vector(1536), -- OpenAI ada-002 dimension
ADD COLUMN subject_embedding_model TEXT DEFAULT 'text-embedding-ada-002',
ADD COLUMN subject_embedding_updated_at TIMESTAMPTZ;
ALTER TABLE dialog
ADD COLUMN content_embedding vector(1536),
ADD COLUMN content_embedding_model TEXT DEFAULT 'text-embedding-ada-002',
ADD COLUMN content_embedding_updated_at TIMESTAMPTZ;
ALTER TABLE analysis
ADD COLUMN summary_embedding vector(1536),
ADD COLUMN summary_embedding_model TEXT DEFAULT 'text-embedding-ada-002',
ADD COLUMN summary_embedding_updated_at TIMESTAMPTZ;
Alternative: Dedicated Embeddings Table
For more flexibility, create a separate embeddings table:
-- Dedicated embeddings table for all content types
CREATE TABLE vcon_embeddings (
id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
vcon_id UUID NOT NULL REFERENCES vcons(id) ON DELETE CASCADE,
-- What is being embedded
content_type TEXT NOT NULL, -- 'subject', 'dialog', 'analysis_summary'
content_reference TEXT, -- dialog_index, analysis_index, etc.
content_text TEXT NOT NULL, -- Original text that was embedded
-- The embedding
embedding vector(1536) NOT NULL,
embedding_model TEXT NOT NULL DEFAULT 'text-embedding-ada-002',
embedding_dimension INTEGER NOT NULL DEFAULT 1536,
-- Metadata
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
-- Composite index for lookups
CONSTRAINT vcon_embeddings_unique UNIQUE (vcon_id, content_type, content_reference)
);
-- Indexes for fast retrieval
CREATE INDEX idx_vcon_embeddings_vcon_id ON vcon_embeddings(vcon_id);
CREATE INDEX idx_vcon_embeddings_content_type ON vcon_embeddings(content_type);
Step 3: Create Vector Indexes
HNSW Index (Recommended for Most Cases)
-- HNSW index for approximate nearest neighbor search
-- Good for: General purpose, balanced performance
CREATE INDEX ON vcon_embeddings
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);
-- For L2 distance
CREATE INDEX ON vcon_embeddings
USING hnsw (embedding vector_l2_ops)
WITH (m = 16, ef_construction = 64);
-- For inner product
CREATE INDEX ON vcon_embeddings
USING hnsw (embedding vector_ip_ops)
WITH (m = 16, ef_construction = 64);
HNSW Parameters:
m
: Max connections per layer (default: 16, higher = more accurate but slower)ef_construction
: Size of dynamic candidate list (default: 64, higher = better recall)
IVFFlat Index (For Large Datasets)
-- IVFFlat index for large-scale search
-- Good for: 1M+ vectors, when you can tolerate some accuracy loss
CREATE INDEX ON vcon_embeddings
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);
-- Adjust lists based on dataset size:
-- lists = rows / 1000 for small datasets (< 1M rows)
-- lists = sqrt(rows) for larger datasets
When to Use Which Index:
HNSW: < 1M vectors, need high accuracy, have memory
IVFFlat: > 1M vectors, can trade accuracy for speed
No Index: < 10K vectors (brute force is fast enough)
Step 4: Generate Embeddings
Option A: Using OpenAI API
import openai
from supabase import create_client
# Initialize clients
openai.api_key = "your-openai-key"
supabase = create_client("your-supabase-url", "your-supabase-key")
def generate_embedding(text: str, model: str = "text-embedding-ada-002") -> list[float]:
"""Generate embedding using OpenAI API."""
response = openai.embeddings.create(
input=text,
model=model
)
return response.data[0].embedding
def embed_vcon_content(vcon_id: str, subject: str, dialog_texts: list[str]):
"""Generate and store embeddings for vCon content."""
# Embed subject
if subject:
subject_embedding = generate_embedding(subject)
supabase.table('vcon_embeddings').upsert({
'vcon_id': vcon_id,
'content_type': 'subject',
'content_reference': None,
'content_text': subject,
'embedding': subject_embedding,
'embedding_model': 'text-embedding-ada-002',
'embedding_dimension': 1536
}).execute()
# Embed dialogs
for idx, text in enumerate(dialog_texts):
if text and len(text.strip()) > 0:
dialog_embedding = generate_embedding(text)
supabase.table('vcon_embeddings').upsert({
'vcon_id': vcon_id,
'content_type': 'dialog',
'content_reference': str(idx),
'content_text': text,
'embedding': dialog_embedding,
'embedding_model': 'text-embedding-ada-002',
'embedding_dimension': 1536
}).execute()
# Usage
embed_vcon_content(
vcon_id="123e4567-e89b-12d3-a456-426614174000",
subject="Customer support call regarding billing issue",
dialog_texts=[
"Hello, I have a question about my bill",
"I can help with that. What's your account number?",
"It's 12345. I was charged twice this month."
]
)
Option B: Using Sentence Transformers (Local/Self-Hosted)
from sentence_transformers import SentenceTransformer
from supabase import create_client
# Load model (384-dim or 768-dim depending on model)
model = SentenceTransformer('all-MiniLM-L6-v2') # 384 dimensions
supabase = create_client("your-supabase-url", "your-supabase-key")
def generate_embedding_local(text: str) -> list[float]:
"""Generate embedding using local Sentence Transformers model."""
embedding = model.encode(text)
return embedding.tolist()
# Update schema for 384-dim embeddings
# ALTER TABLE vcon_embeddings ALTER COLUMN embedding TYPE vector(384);
Option C: Batch Processing with Edge Functions (Preferred in this repo)
// Supabase Edge Function for batch embedding
import { serve } from 'https://deno.land/[email protected]/http/server.ts'
import { createClient } from 'https://esm.sh/@supabase/supabase-js@2'
serve(async (req) => {
const { vcon_id, texts } = await req.json()
// Call OpenAI API
const embeddings = await Promise.all(
texts.map(text => generateEmbedding(text))
)
// Store in Supabase
const supabase = createClient(
Deno.env.get('SUPABASE_URL'),
Deno.env.get('SUPABASE_SERVICE_ROLE_KEY')
)
for (let i = 0; i < embeddings.length; i++) {
await supabase.table('vcon_embeddings').insert({
vcon_id,
content_type: 'dialog',
content_reference: String(i),
content_text: texts[i],
embedding: embeddings[i]
})
}
return new Response(JSON.stringify({ success: true }), {
headers: { 'Content-Type': 'application/json' }
})
})
See docs/INGEST_AND_EMBEDDINGS.md
for the production-ready function (supabase/functions/embed-vcons/index.ts
), environment variables, and Cron scheduling. This repository standardizes on 384‑dim embeddings to match the migrations and HNSW index.
Step 5: Semantic Search Queries
Basic Cosine Similarity Search
-- Function to search by semantic similarity
CREATE OR REPLACE FUNCTION search_vcons_semantic(
query_embedding vector(1536),
match_threshold float DEFAULT 0.7,
match_count int DEFAULT 20
)
RETURNS TABLE (
vcon_id uuid,
content_type text,
content_text text,
similarity float
) AS $$
BEGIN
RETURN QUERY
SELECT
e.vcon_id,
e.content_type,
e.content_text,
1 - (e.embedding <=> query_embedding) AS similarity
FROM vcon_embeddings e
WHERE 1 - (e.embedding <=> query_embedding) > match_threshold
ORDER BY e.embedding <=> query_embedding
LIMIT match_count;
END;
$$ LANGUAGE plpgsql;
-- Usage
SELECT * FROM search_vcons_semantic(
(SELECT embedding FROM vcon_embeddings WHERE id = 'some-id'), -- query embedding
0.7, -- minimum similarity
20 -- max results
);
Similarity Operators in pgvector
-- Cosine distance (most common for semantic search)
-- Returns 0-2, where 0 = identical, 1 = orthogonal, 2 = opposite
embedding1 <=> embedding2
-- L2 distance (Euclidean)
-- Returns >= 0, where 0 = identical
embedding1 <-> embedding2
-- Inner product (negative inner product)
-- Returns negative value, larger (less negative) = more similar
embedding1 <#> embedding2
-- Cosine similarity (1 - cosine distance)
-- Returns 0-1, where 1 = identical, 0 = orthogonal
1 - (embedding1 <=> embedding2)
Python Implementation
from supabase import create_client
import openai
class VConSemanticSearch:
def __init__(self, supabase_url: str, supabase_key: str, openai_key: str):
self.supabase = create_client(supabase_url, supabase_key)
openai.api_key = openai_key
def search(
self,
query: str,
threshold: float = 0.7,
limit: int = 20,
content_types: list[str] = None
) -> list[dict]:
"""
Semantic search for vCons.
Args:
query: Natural language search query
threshold: Minimum similarity (0-1)
limit: Max results
content_types: Filter by content type ['subject', 'dialog', 'analysis_summary']
Returns:
List of matching results with similarity scores
"""
# Generate query embedding
query_embedding = openai.embeddings.create(
input=query,
model="text-embedding-ada-002"
).data[0].embedding
# Build query
rpc_params = {
'query_embedding': query_embedding,
'match_threshold': threshold,
'match_count': limit
}
# Execute search
results = self.supabase.rpc(
'search_vcons_semantic',
rpc_params
).execute()
# Group by vCon
vcon_results = {}
for row in results.data:
vcon_id = row['vcon_id']
if vcon_id not in vcon_results:
vcon_results[vcon_id] = {
'vcon_id': vcon_id,
'max_similarity': row['similarity'],
'matches': []
}
vcon_results[vcon_id]['matches'].append({
'content_type': row['content_type'],
'content_text': row['content_text'],
'similarity': row['similarity']
})
# Sort by max similarity
return sorted(
vcon_results.values(),
key=lambda x: x['max_similarity'],
reverse=True
)
# Usage
searcher = VConSemanticSearch(
supabase_url="your-url",
supabase_key="your-key",
openai_key="your-openai-key"
)
results = searcher.search(
query="customer frustrated about billing errors",
threshold=0.75,
limit=10
)
for result in results:
print(f"vCon {result['vcon_id']}: {result['max_similarity']:.3f}")
for match in result['matches']:
print(f" - {match['content_type']}: {match['content_text'][:100]}...")
Step 6: Hybrid Search (Semantic + Exact)
SQL Function for Hybrid Search with Tags-from-Attachments
CREATE OR REPLACE FUNCTION search_vcons_hybrid(
query_text text,
query_embedding vector(1536),
tag_filters jsonb DEFAULT '{}',
semantic_weight float DEFAULT 0.5,
match_threshold float DEFAULT 0.7,
match_count int DEFAULT 20
)
RETURNS TABLE (
vcon_id uuid,
subject text,
tags jsonb,
semantic_score float,
keyword_score float,
combined_score float
) AS $$
BEGIN
RETURN QUERY
WITH tags_kv AS (
SELECT a.vcon_id,
split_part(elem, ':', 1) AS key,
split_part(elem, ':', 2) AS value
FROM attachments a
CROSS JOIN LATERAL jsonb_array_elements_text(a.body::jsonb) AS elem
WHERE a.type = 'tags' AND a.encoding = 'json'
),
tags_agg AS (
SELECT vcon_id, jsonb_object_agg(key, value) AS tags
FROM tags_kv GROUP BY vcon_id
),
semantic_results AS (
SELECT
e.vcon_id,
MAX(1 - (e.embedding <=> query_embedding)) AS semantic_score
FROM vcon_embeddings e
WHERE 1 - (e.embedding <=> query_embedding) > match_threshold
GROUP BY e.vcon_id
),
keyword_results AS (
SELECT
v.id AS vcon_id,
ts_rank_cd(
setweight(to_tsvector('english', COALESCE(v.subject, '')), 'A') ||
setweight(to_tsvector('english', COALESCE(d.body, '')), 'C') ||
setweight(to_tsvector('english', COALESCE(a.body, '')), 'B'),
plainto_tsquery('english', query_text)
) AS keyword_score
FROM vcons v
LEFT JOIN dialog d ON d.vcon_id = v.id
LEFT JOIN analysis a ON a.vcon_id = v.id
WHERE query_text IS NOT NULL
AND (setweight(to_tsvector('english', COALESCE(v.subject, '')), 'A') ||
setweight(to_tsvector('english', COALESCE(d.body, '')), 'C') ||
setweight(to_tsvector('english', COALESCE(a.body, '')), 'B'))
@@ plainto_tsquery('english', query_text)
)
SELECT
v.id AS vcon_id,
COALESCE(sr.semantic_score, 0) AS semantic_score,
COALESCE(kr.keyword_score, 0) AS keyword_score,
(COALESCE(sr.semantic_score, 0) * semantic_weight +
COALESCE(kr.keyword_score, 0) * (1 - semantic_weight)) AS combined_score
FROM vcons v
LEFT JOIN tags_agg ta ON ta.vcon_id = v.id
LEFT JOIN semantic_results sr ON v.id = sr.vcon_id
LEFT JOIN keyword_results kr ON v.id = kr.vcon_id
WHERE (tag_filters = '{}'::jsonb OR (ta.tags IS NOT NULL AND ta.tags @> tag_filters))
AND (sr.semantic_score IS NOT NULL OR kr.keyword_score IS NOT NULL)
ORDER BY combined_score DESC
LIMIT match_count;
END;
$$ LANGUAGE plpgsql;
Python Implementation
def hybrid_search(
self,
semantic_query: str = None,
keyword_query: str = None,
tags: dict = None,
semantic_weight: float = 0.5,
threshold: float = 0.7,
limit: int = 20
) -> list[dict]:
"""
Hybrid search combining semantic and keyword search with tag filters.
Args:
semantic_query: Natural language query for semantic search
keyword_query: Keywords for full-text search
tags: Exact tag filters
semantic_weight: Weight for semantic vs keyword (0-1)
threshold: Minimum semantic similarity
limit: Max results
"""
# Generate embedding if semantic query provided
query_embedding = None
if semantic_query:
query_embedding = openai.embeddings.create(
input=semantic_query,
model="text-embedding-ada-002"
).data[0].embedding
# Execute hybrid search
results = self.supabase.rpc('search_vcons_hybrid', {
'query_text': keyword_query or semantic_query or '',
'query_embedding': query_embedding,
'tag_filters': tags or {},
'semantic_weight': semantic_weight,
'match_threshold': threshold,
'match_count': limit
}).execute()
return results.data
# Usage
results = searcher.hybrid_search(
semantic_query="customer unhappy with billing",
tags={"department": "support", "status": "open"},
semantic_weight=0.7,
limit=10
)
Step 7: Automatic Embedding Generation
Trigger for Automatic Embedding on Insert
-- Function to automatically generate embeddings
CREATE OR REPLACE FUNCTION generate_vcon_embeddings()
RETURNS TRIGGER AS $$
BEGIN
-- Call Edge Function or external service to generate embeddings
-- This is a placeholder - actual implementation would call embedding service
PERFORM net.http_post(
url := current_setting('app.embedding_service_url'),
body := jsonb_build_object(
'vcon_id', NEW.id,
'subject', NEW.subject,
'metadata', NEW.metadata
)
);
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
-- Trigger to automatically generate embeddings
CREATE TRIGGER trigger_generate_embeddings
AFTER INSERT OR UPDATE OF subject, metadata ON vcons
FOR EACH ROW
EXECUTE FUNCTION generate_vcon_embeddings();
Background Job for Batch Embedding
import asyncio
from datetime import datetime, timedelta
async def embedding_worker():
"""Background worker to generate embeddings for new vCons."""
while True:
# Find vCons without embeddings
result = supabase.table('vcons').select('id, subject').is_('subject_embedding', 'null').limit(100).execute()
for vcon in result.data:
try:
# Generate embedding
embedding = generate_embedding(vcon['subject'])
# Update vCon
supabase.table('vcons').update({
'subject_embedding': embedding,
'subject_embedding_updated_at': datetime.now().isoformat()
}).eq('id', vcon['id']).execute()
except Exception as e:
print(f"Error embedding vCon {vcon['id']}: {e}")
# Wait before next batch
await asyncio.sleep(60)
# Run worker
asyncio.run(embedding_worker())
Step 8: Performance Optimization
Query Optimization
-- Set search parameters for better performance
SET hnsw.ef_search = 40; -- Default is 40, increase for better recall
-- For IVFFlat
SET ivfflat.probes = 10; -- Default is 1, increase for better recall
Embedding Dimension Reduction
# Use smaller embedding models for better performance
# OpenAI text-embedding-3-small: 1536 dims (default) or can be reduced
# Sentence Transformers all-MiniLM-L6-v2: 384 dims
# Reduce OpenAI embedding dimensions
response = openai.embeddings.create(
input=text,
model="text-embedding-3-small",
dimensions=512 # Reduce from 1536 to 512
)
Caching Strategy
import hashlib
import redis
redis_client = redis.Redis(host='localhost', port=6379, db=0)
def get_cached_embedding(text: str) -> list[float] | None:
"""Get embedding from cache."""
cache_key = f"emb:{hashlib.md5(text.encode()).hexdigest()}"
cached = redis_client.get(cache_key)
if cached:
return json.loads(cached)
return None
def cache_embedding(text: str, embedding: list[float]):
"""Cache embedding for 7 days."""
cache_key = f"emb:{hashlib.md5(text.encode()).hexdigest()}"
redis_client.setex(cache_key, 604800, json.dumps(embedding)) # 7 days
def generate_embedding_with_cache(text: str) -> list[float]:
"""Generate embedding with caching."""
cached = get_cached_embedding(text)
if cached:
return cached
embedding = generate_embedding(text)
cache_embedding(text, embedding)
return embedding
Step 9: Monitoring & Maintenance
Track Embedding Coverage
-- Check embedding coverage
SELECT
COUNT(*) AS total_vcons,
COUNT(subject_embedding) AS vcons_with_subject_embedding,
COUNT(subject_embedding)::float / COUNT(*) * 100 AS coverage_percentage
FROM vcons;
-- Find vCons missing embeddings
SELECT id, subject, created_at
FROM vcons
WHERE subject_embedding IS NULL
ORDER BY created_at DESC
LIMIT 100;
Monitor Search Performance
-- Enable timing
\timing on
-- Test query performance
EXPLAIN ANALYZE
SELECT *
FROM search_vcons_semantic(
(SELECT embedding FROM vcon_embeddings WHERE id = 'test-id'),
0.7,
20
);
Index Maintenance
-- Rebuild index if needed
REINDEX INDEX vcon_embeddings_embedding_idx;
-- Vacuum to reclaim space
VACUUM ANALYZE vcon_embeddings;
Cost Considerations
OpenAI Embedding Costs
text-embedding-ada-002: $0.0001 per 1K tokens (~750 words)
text-embedding-3-small: $0.00002 per 1K tokens
text-embedding-3-large: $0.00013 per 1K tokens
Example Costs:
100K vCons with 200-word subjects: $26.67 (ada-002)
1M dialog messages averaging 50 words: $66.67 (ada-002)
Self-Hosted Alternative
Use Sentence Transformers locally:
No API costs
Faster for batch processing
384-768 dimensions (vs 1536)
Slightly lower accuracy
Summary
Key Decisions
Embedding Model
OpenAI ada-002: Best accuracy, API costs
Sentence Transformers: Free, good accuracy, self-hosted
Storage Strategy
Dedicated embeddings table (recommended)
Embedded in existing tables (simpler)
Index Type
HNSW: < 1M vectors, high accuracy
IVFFlat: > 1M vectors, faster but less accurate
Search Strategy
Pure semantic: Best for natural language queries
Hybrid: Combine with tags and keywords for precision
Implementation Checklist
This architecture provides production-ready semantic search for vCon conversations in Supabase! 🚀
Last updated