Search & Query

Overview

The vCon MCP server provides four search tools with different capabilities, from simple filtering to advanced semantic search.

Available Search Tools

1. `search_vcons` - Basic Filter Search

Best for: Finding vCons by metadata (subject, parties, dates)

Searches:

Subject line
Party names, emails, phone numbers
Creation dates

Does NOT search:

Dialog content
Analysis content
Attachments

Example:

{
  "subject": "customer support",
  "party_name": "John Doe",
  "start_date": "2024-01-01T00:00:00Z",
  "limit": 10
}

Returns: Complete vCon objects matching the filters

2. `search_vcons_content` - Keyword Search

Best for: Finding specific words or phrases in conversation content

Searches:

✅ Subject
✅ Dialog bodies (conversations, transcripts)
✅ Analysis bodies (summaries, sentiment, etc.)
✅ Party information (names, emails, phones)
❌ Attachments (not indexed for full-text search)

Features:

Full-text search with ranking
Typo tolerance via trigram indexing
Highlighted snippets in results
Tag filtering support
Date range filtering

Example:

{
  "query": "billing issue refund",
  "tags": {"department": "sales"},
  "start_date": "2024-01-01T00:00:00Z",
  "limit": 50
}

Returns: Ranked results with snippets showing where matches were found

Result format:

{
  "success": true,
  "count": 5,
  "results": [
    {
      "vcon_id": "uuid",
      "content_type": "analysis",  // or "subject", "dialog", "party"
      "content_index": 0,
      "relevance_score": 0.85,
      "snippet": "...regarding the billing issue and potential refund..."
    }
  ]
}

3. `search_vcons_semantic` - AI-Powered Semantic Search

Best for: Finding conversations by meaning, not just keywords

Searches:

✅ Subject (embedded)
✅ Dialog bodies (embedded)
✅ Analysis bodies with encoding='none' or NULL (embedded)
❌ Analysis with encoding='base64url' or encoding='json' (not embedded)
❌ Attachments (not embedded)

Features:

Finds conceptually similar content
Works across paraphrases and synonyms
AI embeddings using 384-dimensional vectors
Tag filtering support
Similarity threshold control

Requirements:

Embeddings must be generated first (see embedding documentation)
Currently requires pre-computed embedding vector (384 dimensions)

Example:

{
  "query": "customer angry about late delivery",
  "threshold": 0.7,
  "limit": 20
}

Note: Automatic embedding generation from query text is not yet implemented. Use search_vcons_content for keyword-based search without embeddings.

Returns: Similar conversations ranked by semantic similarity

4. `search_vcons_hybrid` - Combined Keyword + Semantic Search

Best for: Comprehensive search combining exact matches and conceptual similarity

Searches:

Everything from keyword search (subject, dialog, analysis, parties)
Everything from semantic search (embedded content)

Features:

Combines full-text and semantic search
Adjustable weighting between keyword and semantic results
Best of both worlds: exact matches + conceptual matches
Tag filtering support

Example:

{
  "query": "billing dispute",
  "semantic_weight": 0.6,
  "tags": {"priority": "high"},
  "limit": 30
}

Parameters:

semantic_weight: 0-1 (default 0.6)
- 0.0 = 100% keyword search
- 1.0 = 100% semantic search
- 0.6 = 60% semantic, 40% keyword (recommended)

Returns: Combined results with both keyword and semantic scores

What About Attachments?

Current Status

Attachments are NOT indexed for search in the current implementation.

Why?

Binary content: Many attachments contain binary data (PDFs, images, audio) that isn't suitable for text-based search
Encoding: Attachments with encoding='base64url' contain encoded data, not searchable text
Structured data: Attachments with encoding='json' contain structured data that produces poor quality embeddings

Special Case: Tags

Attachments of type tags with encoding='json' ARE used for filtering, but not for content search.

Example tags attachment:

{
  "type": "tags",
  "encoding": "json",
  "body": ["department:sales", "priority:high", "region:west"]
}

These tags can be used with the tags parameter in any search tool:

{
  "query": "customer complaint",
  "tags": {"department": "sales", "priority": "high"}
}

Future Enhancements

Potential future support for attachment content search:

Text extraction: Extract text from PDFs, Word docs, etc.
Audio transcription: Transcribe audio attachments to searchable text
OCR: Extract text from images
Selective indexing: Index only attachments with text content

If you need to search attachment content, consider:

Extracting text and adding it as an analysis element
Adding a summary of attachment content as an analysis
Using attachment metadata in tags

Analysis Encoding and Search

Analysis Elements ARE Searchable

Analysis elements are included in search, with filtering based on encoding:

Encoding

Keyword Search

Semantic Search

Notes

none or NULL

✅ Yes

Plain text content, ideal for search

json

✅ Yes

❌ No

Included in keyword search only

base64url

✅ Yes

❌ No

Included in keyword search only

Why Filter Semantic Search by Encoding?

Analysis with encoding='none' contains human-readable text like:

Conversation summaries
Transcriptions
Sentiment analysis results
Translation output
Natural language insights

These are ideal for semantic search because they contain meaningful natural language.

Analysis with encoding='json' or encoding='base64url' typically contains:

Structured data (poor quality embeddings)
Binary content (not suitable for embeddings)
Encoded data (not searchable as text)

Search Comparison

Feature

search_vcons

search_vcons_content

search_vcons_semantic

search_vcons_hybrid

Subject

✅ Filter

✅ Search

Dialog

❌

✅ Search

Analysis

❌

✅ Search

✅ (encoding=none)

✅ All

Attachments

❌

Party Info

✅ Filter

✅ Search

❌

✅ Search

Best Practices

When to Use Each Tool

search_vcons: Quick metadata lookups
- "Find vCons with party email [email protected]"
- "Show me vCons from last week"
- "List vCons with subject containing 'urgent'"
search_vcons_content: Keyword-based content search
- "Find conversations mentioning 'refund'"
- "Search for 'technical support' in dialog"
- "Find analysis containing 'positive sentiment'"
search_vcons_semantic: Concept-based search
- "Find conversations where customer was unhappy"
- "Show me calls about payment issues"
- "Find similar conversations to this one"
search_vcons_hybrid: Comprehensive search
- "Find all billing-related conversations" (gets both exact matches and related topics)
- "Search for customer complaints" (finds variations and synonyms)
- Best when you want both precision and recall

Performance Tips

Use filters: Date ranges and tags can dramatically reduce search scope
Set appropriate limits: Start with smaller limits (10-20) for faster results
Choose the right tool: Don't use semantic search if keyword search is sufficient
Pre-generate embeddings: Semantic search requires embeddings to be generated beforehand

Generating Embeddings

For semantic and hybrid search to work effectively, you need to generate embeddings for your vCons.

See the following guides:

INGEST_AND_EMBEDDINGS.md - Complete guide to embedding generation
EMBEDDING_STRATEGY_UPGRADE.md - Details on which content is embedded

Quick start:

# Generate embeddings for all vCons
./scripts/backfill-embeddings.sh 500 2

# Check embedding coverage
psql $DATABASE_URL -f scripts/check-embedding-coverage.sql

Troubleshooting

"No results found" for content search

Check that the content exists in dialog or analysis
Try a simpler query (fewer words)
Use wildcards or partial words
Check date range filters

"Embedding generation not yet implemented"

Semantic search currently requires pre-computed embeddings
Use search_vcons_content for keyword search instead
Generate embeddings using the scripts in /scripts/

"Embedding must be 384 dimensions"

The system uses 384-dimensional embeddings
If you're providing embeddings, ensure they match this dimension
Use text-embedding-3-small with dimensions=384 (OpenAI)
Or use sentence-transformers/all-MiniLM-L6-v2 (Hugging Face)

Poor search results

For keyword search: Try simpler, more specific terms
For semantic search: Ensure embeddings are up to date
For hybrid search: Adjust semantic_weight parameter
Consider using tags to filter results

Examples

Find customer complaints in dialog

{
  "query": "customer complaint angry upset frustrated",
  "limit": 20
}

Find high-priority sales conversations

{
  "query": "pricing quote proposal",
  "tags": {
    "department": "sales",
    "priority": "high"
  },
  "start_date": "2024-01-01T00:00:00Z"
}

Hybrid search with keyword emphasis

{
  "query": "billing invoice payment",
  "semantic_weight": 0.3,
  "limit": 30
}

Find conversations similar to a specific vCon

Get the vCon's embedding from the database
Use it in semantic search:

{
  "embedding": [0.123, 0.456, ...],  // 384 dimensions
  "threshold": 0.75,
  "limit": 10
}

QUICK_START.md - Getting started with vCon MCP
INGEST_AND_EMBEDDINGS.md - Embedding generation
SUPABASE_SEMANTIC_SEARCH_GUIDE.md - Database search implementation

PreviousBasic Usage NextTag Management

Last updated 4 days ago

Overview

Available Search Tools

1. search_vcons - Basic Filter Search

2. search_vcons_content - Keyword Search

3. search_vcons_semantic - AI-Powered Semantic Search

4. search_vcons_hybrid - Combined Keyword + Semantic Search

What About Attachments?

Current Status

Special Case: Tags

Future Enhancements

Analysis Encoding and Search

Analysis Elements ARE Searchable

Why Filter Semantic Search by Encoding?

Search Comparison

Best Practices

When to Use Each Tool

Performance Tips

Generating Embeddings

Troubleshooting

"No results found" for content search

"Embedding generation not yet implemented"

"Embedding must be 384 dimensions"

Poor search results

Examples

Find customer complaints in dialog

Find high-priority sales conversations

Hybrid search with keyword emphasis

Find conversations similar to a specific vCon

Related Documentation

1. `search_vcons` - Basic Filter Search

2. `search_vcons_content` - Keyword Search

3. `search_vcons_semantic` - AI-Powered Semantic Search

4. `search_vcons_hybrid` - Combined Keyword + Semantic Search