Search & Query

Overview

The vCon MCP server provides four search tools with different capabilities, from simple filtering to advanced semantic search.

Available Search Tools

Best for: Finding vCons by metadata (subject, parties, dates)

Searches:

  • Subject line

  • Party names, emails, phone numbers

  • Creation dates

Does NOT search:

  • Dialog content

  • Analysis content

  • Attachments

Example:

{
  "subject": "customer support",
  "party_name": "John Doe",
  "start_date": "2024-01-01T00:00:00Z",
  "limit": 10
}

Returns: Complete vCon objects matching the filters


Best for: Finding specific words or phrases in conversation content

Searches:

  • ✅ Subject

  • ✅ Dialog bodies (conversations, transcripts)

  • ✅ Analysis bodies (summaries, sentiment, etc.)

  • ✅ Party information (names, emails, phones)

  • ❌ Attachments (not indexed for full-text search)

Features:

  • Full-text search with ranking

  • Typo tolerance via trigram indexing

  • Highlighted snippets in results

  • Tag filtering support

  • Date range filtering

Example:

{
  "query": "billing issue refund",
  "tags": {"department": "sales"},
  "start_date": "2024-01-01T00:00:00Z",
  "limit": 50
}

Returns: Ranked results with snippets showing where matches were found

Result format:

{
  "success": true,
  "count": 5,
  "results": [
    {
      "vcon_id": "uuid",
      "content_type": "analysis",  // or "subject", "dialog", "party"
      "content_index": 0,
      "relevance_score": 0.85,
      "snippet": "...regarding the billing issue and potential refund..."
    }
  ]
}

Best for: Finding conversations by meaning, not just keywords

Searches:

  • ✅ Subject (embedded)

  • ✅ Dialog bodies (embedded)

  • ✅ Analysis bodies with encoding='none' or NULL (embedded)

  • ❌ Analysis with encoding='base64url' or encoding='json' (not embedded)

  • ❌ Attachments (not embedded)

Features:

  • Finds conceptually similar content

  • Works across paraphrases and synonyms

  • AI embeddings using 384-dimensional vectors

  • Tag filtering support

  • Similarity threshold control

Requirements:

  • Embeddings must be generated first (see embedding documentation)

  • Currently requires pre-computed embedding vector (384 dimensions)

Example:

{
  "query": "customer angry about late delivery",
  "threshold": 0.7,
  "limit": 20
}

Note: Automatic embedding generation from query text is not yet implemented. Use search_vcons_content for keyword-based search without embeddings.

Returns: Similar conversations ranked by semantic similarity


Best for: Comprehensive search combining exact matches and conceptual similarity

Searches:

  • Everything from keyword search (subject, dialog, analysis, parties)

  • Everything from semantic search (embedded content)

Features:

  • Combines full-text and semantic search

  • Adjustable weighting between keyword and semantic results

  • Best of both worlds: exact matches + conceptual matches

  • Tag filtering support

Example:

{
  "query": "billing dispute",
  "semantic_weight": 0.6,
  "tags": {"priority": "high"},
  "limit": 30
}

Parameters:

  • semantic_weight: 0-1 (default 0.6)

    • 0.0 = 100% keyword search

    • 1.0 = 100% semantic search

    • 0.6 = 60% semantic, 40% keyword (recommended)

Returns: Combined results with both keyword and semantic scores


What About Attachments?

Current Status

Attachments are NOT indexed for search in the current implementation.

Why?

  1. Binary content: Many attachments contain binary data (PDFs, images, audio) that isn't suitable for text-based search

  2. Encoding: Attachments with encoding='base64url' contain encoded data, not searchable text

  3. Structured data: Attachments with encoding='json' contain structured data that produces poor quality embeddings

Special Case: Tags

Attachments of type tags with encoding='json' ARE used for filtering, but not for content search.

Example tags attachment:

{
  "type": "tags",
  "encoding": "json",
  "body": ["department:sales", "priority:high", "region:west"]
}

These tags can be used with the tags parameter in any search tool:

{
  "query": "customer complaint",
  "tags": {"department": "sales", "priority": "high"}
}

Future Enhancements

Potential future support for attachment content search:

  1. Text extraction: Extract text from PDFs, Word docs, etc.

  2. Audio transcription: Transcribe audio attachments to searchable text

  3. OCR: Extract text from images

  4. Selective indexing: Index only attachments with text content

If you need to search attachment content, consider:

  1. Extracting text and adding it as an analysis element

  2. Adding a summary of attachment content as an analysis

  3. Using attachment metadata in tags


Analysis Elements ARE Searchable

Analysis elements are included in search, with filtering based on encoding:

Encoding
Keyword Search
Semantic Search
Notes

none or NULL

✅ Yes

✅ Yes

Plain text content, ideal for search

json

✅ Yes

❌ No

Included in keyword search only

base64url

✅ Yes

❌ No

Included in keyword search only

Why Filter Semantic Search by Encoding?

Analysis with encoding='none' contains human-readable text like:

  • Conversation summaries

  • Transcriptions

  • Sentiment analysis results

  • Translation output

  • Natural language insights

These are ideal for semantic search because they contain meaningful natural language.

Analysis with encoding='json' or encoding='base64url' typically contains:

  • Structured data (poor quality embeddings)

  • Binary content (not suitable for embeddings)

  • Encoded data (not searchable as text)


Search Comparison

Feature
search_vcons
search_vcons_content
search_vcons_semantic
search_vcons_hybrid

Subject

✅ Filter

✅ Search

✅ Search

✅ Search

Dialog

✅ Search

✅ Search

✅ Search

Analysis

✅ Search

✅ (encoding=none)

✅ All

Attachments

Party Info

✅ Filter

✅ Search

✅ Search

Tags

✅ Filter

✅ Filter

✅ Filter

Ranking

✅ Relevance

✅ Similarity

✅ Combined

Snippets

✅ Yes

Requires Embeddings

⚠️ Optional


Best Practices

When to Use Each Tool

  1. search_vcons: Quick metadata lookups

    • "Find vCons with party email [email protected]"

    • "Show me vCons from last week"

    • "List vCons with subject containing 'urgent'"

  2. search_vcons_content: Keyword-based content search

    • "Find conversations mentioning 'refund'"

    • "Search for 'technical support' in dialog"

    • "Find analysis containing 'positive sentiment'"

  3. search_vcons_semantic: Concept-based search

    • "Find conversations where customer was unhappy"

    • "Show me calls about payment issues"

    • "Find similar conversations to this one"

  4. search_vcons_hybrid: Comprehensive search

    • "Find all billing-related conversations" (gets both exact matches and related topics)

    • "Search for customer complaints" (finds variations and synonyms)

    • Best when you want both precision and recall

Performance Tips

  1. Use filters: Date ranges and tags can dramatically reduce search scope

  2. Set appropriate limits: Start with smaller limits (10-20) for faster results

  3. Choose the right tool: Don't use semantic search if keyword search is sufficient

  4. Pre-generate embeddings: Semantic search requires embeddings to be generated beforehand


Generating Embeddings

For semantic and hybrid search to work effectively, you need to generate embeddings for your vCons.

See the following guides:

Quick start:

# Generate embeddings for all vCons
./scripts/backfill-embeddings.sh 500 2

# Check embedding coverage
psql $DATABASE_URL -f scripts/check-embedding-coverage.sql

Troubleshooting

  • Check that the content exists in dialog or analysis

  • Try a simpler query (fewer words)

  • Use wildcards or partial words

  • Check date range filters

"Embedding generation not yet implemented"

  • Semantic search currently requires pre-computed embeddings

  • Use search_vcons_content for keyword search instead

  • Generate embeddings using the scripts in /scripts/

"Embedding must be 384 dimensions"

  • The system uses 384-dimensional embeddings

  • If you're providing embeddings, ensure they match this dimension

  • Use text-embedding-3-small with dimensions=384 (OpenAI)

  • Or use sentence-transformers/all-MiniLM-L6-v2 (Hugging Face)

Poor search results

  • For keyword search: Try simpler, more specific terms

  • For semantic search: Ensure embeddings are up to date

  • For hybrid search: Adjust semantic_weight parameter

  • Consider using tags to filter results


Examples

Find customer complaints in dialog

{
  "query": "customer complaint angry upset frustrated",
  "limit": 20
}

Find high-priority sales conversations

{
  "query": "pricing quote proposal",
  "tags": {
    "department": "sales",
    "priority": "high"
  },
  "start_date": "2024-01-01T00:00:00Z"
}

Hybrid search with keyword emphasis

{
  "query": "billing invoice payment",
  "semantic_weight": 0.3,
  "limit": 30
}

Find conversations similar to a specific vCon

  1. Get the vCon's embedding from the database

  2. Use it in semantic search:

{
  "embedding": [0.123, 0.456, ...],  // 384 dimensions
  "threshold": 0.75,
  "limit": 10
}

Last updated