# Search & Query

## Overview

The vCon MCP server now exposes a recommended unified search surface plus the older specialized search tools.

For new clients, prefer `vcon_search` first. It gives you one predictable response envelope, explicit `include` groups, cursor pagination, and response-size budgeting. The older search tools are still available and still useful for compatibility or specialized flows.

## Recommended Starting Point

### `vcon_search` - Unified Search

**Best for:** New clients that want one search entry point with predictable parsing

**Modes:**

* `metadata`
* `keyword`
* `semantic`
* `hybrid`

**Key advantages:**

* Stable `{ok, items, page}` envelope
* Explicit `include` groups such as `core`, `summary`, `tags`, and `dealer`
* Cursor pagination instead of mixed ad hoc paging behavior
* Explicit `max_response_bytes` so oversized responses fail loudly with `RESPONSE_TOO_LARGE`

**Example:**

```json
{
  "mode": "keyword",
  "query": "billing dispute",
  "filters": {
    "tags": {
      "portal": "negative_experience"
    }
  },
  "include": ["core", "summary", "dealer", "tags"],
  "limit": 25,
  "max_response_bytes": 120000
}
```

**Typical companion tools:**

* `vcon_capabilities` to discover supported modes, includes, and byte budgets
* `vcon_taxonomy` to discover the portal taxonomy and preferred dealer source
* `vcon_fetch` to expand a selected result with additional include groups

## Legacy Search Tools

The tools below remain supported and useful. They are especially helpful for older clients, direct low-level access, or cases where you intentionally want their narrower behavior.

## Available Search Tools

### 1. `search_vcons` - Basic Filter Search

**Best for:** Finding vCons by metadata (subject, parties, dates)

**Searches:**

* Subject line
* Party names, emails, phone numbers
* Creation dates

**Does NOT search:**

* Dialog content
* Analysis content
* Attachments

**Example:**

```json
{
  "subject": "customer support",
  "party_name": "John Doe",
  "start_date": "2024-01-01T00:00:00Z",
  "limit": 10
}
```

**Returns:** Complete vCon objects matching the filters

***

### 2. `search_vcons_content` - Keyword Search

**Best for:** Finding specific words or phrases in conversation content

**Searches:**

* ✅ Subject
* ✅ Dialog bodies (conversations, transcripts)
* ✅ Analysis bodies (summaries, sentiment, etc.)
* ✅ Party information (names, emails, phones)
* ❌ Attachments (not indexed for full-text search)

**Features:**

* Full-text search with ranking
* Typo tolerance via trigram indexing
* Highlighted snippets in results
* Tag filtering support
* Date range filtering

**Example:**

```json
{
  "query": "billing issue refund",
  "tags": {"department": "sales"},
  "start_date": "2024-01-01T00:00:00Z",
  "limit": 50
}
```

**Returns:** Ranked results with snippets showing where matches were found

**Result format:**

```json
{
  "success": true,
  "count": 5,
  "results": [
    {
      "vcon_id": "uuid",
      "content_type": "analysis",  // or "subject", "dialog", "party"
      "content_index": 0,
      "relevance_score": 0.85,
      "snippet": "...regarding the billing issue and potential refund..."
    }
  ]
}
```

***

### 3. `search_vcons_semantic` - AI-Powered Semantic Search

**Best for:** Finding conversations by meaning, not just keywords

**Searches:**

* ✅ Subject (embedded)
* ✅ Dialog bodies (embedded)
* ✅ Analysis bodies with `encoding='none'` or `NULL` (embedded)
* ❌ Analysis with `encoding='base64url'` or `encoding='json'` (not embedded)
* ❌ Attachments (not embedded)

**Features:**

* Finds conceptually similar content
* Works across paraphrases and synonyms
* AI embeddings using 384-dimensional vectors
* Tag filtering support
* Similarity threshold control

**Requirements:**

* Embeddings must be generated first (see embedding documentation)
* Currently requires pre-computed embedding vector (384 dimensions)

**Example:**

```json
{
  "query": "customer angry about late delivery",
  "threshold": 0.7,
  "limit": 20
}
```

**Returns:** Similar conversations ranked by semantic similarity

***

### 4. `search_vcons_hybrid` - Combined Keyword + Semantic Search

**Best for:** Comprehensive search combining exact matches and conceptual similarity

**Searches:**

* Everything from keyword search (subject, dialog, analysis, parties)
* Everything from semantic search (embedded content)

**Features:**

* Combines full-text and semantic search
* Adjustable weighting between keyword and semantic results
* Best of both worlds: exact matches + conceptual matches
* Tag filtering support

**Example:**

```json
{
  "query": "billing dispute",
  "semantic_weight": 0.6,
  "tags": {"priority": "high"},
  "limit": 30
}
```

**Parameters:**

* `semantic_weight`: 0-1 (default 0.6)
  * 0.0 = 100% keyword search
  * 1.0 = 100% semantic search
  * 0.6 = 60% semantic, 40% keyword (recommended)

**Returns:** Combined results with both keyword and semantic scores

***

## What About Attachments?

### Current Status

**Attachments are NOT indexed for search** in the current implementation.

**Why?**

1. **Binary content**: Many attachments contain binary data (PDFs, images, audio) that isn't suitable for text-based search
2. **Encoding**: Attachments with `encoding='base64url'` contain encoded data, not searchable text
3. **Structured data**: Attachments with `encoding='json'` contain structured data that produces poor quality embeddings

### Special Case: Tags

Attachments of type `tags` with `encoding='json'` ARE used for filtering, but not for content search.

Example tags attachment:

```json
{
  "type": "tags",
  "encoding": "json",
  "body": ["department:sales", "priority:high", "region:west"]
}
```

These tags can be used with the `tags` parameter in any search tool:

```json
{
  "query": "customer complaint",
  "tags": {"department": "sales", "priority": "high"}
}
```

### Future Enhancements

Potential future support for attachment content search:

1. **Text extraction**: Extract text from PDFs, Word docs, etc.
2. **Audio transcription**: Transcribe audio attachments to searchable text
3. **OCR**: Extract text from images
4. **Selective indexing**: Index only attachments with text content

If you need to search attachment content, consider:

1. Extracting text and adding it as an analysis element
2. Adding a summary of attachment content as an analysis
3. Using attachment metadata in tags

***

## Analysis Encoding and Search

### Analysis Elements ARE Searchable

Analysis elements are included in search, with filtering based on encoding:

| Encoding         | Keyword Search | Semantic Search | Notes                                |
| ---------------- | -------------- | --------------- | ------------------------------------ |
| `none` or `NULL` | ✅ Yes          | ✅ Yes           | Plain text content, ideal for search |
| `json`           | ✅ Yes          | ❌ No            | Included in keyword search only      |
| `base64url`      | ✅ Yes          | ❌ No            | Included in keyword search only      |

### Why Filter Semantic Search by Encoding?

Analysis with `encoding='none'` contains human-readable text like:

* Conversation summaries
* Transcriptions
* Sentiment analysis results
* Translation output
* Natural language insights

These are ideal for semantic search because they contain meaningful natural language.

Analysis with `encoding='json'` or `encoding='base64url'` typically contains:

* Structured data (poor quality embeddings)
* Binary content (not suitable for embeddings)
* Encoded data (not searchable as text)

***

## Search Comparison

| Feature             | vcon\_search                                          | search\_vcons | search\_vcons\_content | search\_vcons\_semantic | search\_vcons\_hybrid |
| ------------------- | ----------------------------------------------------- | ------------- | ---------------------- | ----------------------- | --------------------- |
| Subject             | ✅ Depends on mode                                     | ✅ Filter      | ✅ Search               | ✅ Search                | ✅ Search              |
| Dialog              | ✅ Depends on mode/include                             | ❌             | ✅ Search               | ✅ Search                | ✅ Search              |
| Analysis            | ✅ Depends on mode/include                             | ❌             | ✅ Search               | ✅ (encoding=none)       | ✅ All                 |
| Attachments         | ✅ Via explicit include groups on fetch/search results | ❌             | ❌                      | ❌                       | ❌                     |
| Party Info          | ✅ Depends on mode/include                             | ✅ Filter      | ✅ Search               | ❌                       | ✅ Search              |
| Tags                | ✅ Filter and return                                   | ❌             | ✅ Filter               | ✅ Filter                | ✅ Filter              |
| Ranking             | ✅ Depends on mode                                     | ❌             | ✅ Relevance            | ✅ Similarity            | ✅ Combined            |
| Snippets            | ❌                                                     | ❌             | ✅ Yes                  | ❌                       | ❌                     |
| Requires Embeddings | Only for semantic/hybrid mode                         | ❌             | ❌                      | ✅                       | ⚠️ Optional           |
| Cursor Pagination   | ✅ Yes                                                 | ❌             | ❌                      | ❌                       | ❌                     |
| Response Budgeting  | ✅ `max_response_bytes`                                | ❌             | ❌                      | ❌                       | ❌                     |

***

## Best Practices

### When to Use Each Tool

1. **`vcon_search`**: Default choice for new clients
   * "Use one parser for metadata, keyword, semantic, and hybrid search"
   * "Return lightweight summaries plus dealer info"
   * "Fail loudly instead of silently returning oversized payloads"
2. **`search_vcons`**: Quick metadata lookups in older clients
   * "Find vCons with party email <john@example.com>"
   * "Show me vCons from last week"
   * "List vCons with subject containing 'urgent'"
3. **`search_vcons_content`**: Keyword-based content search
   * "Find conversations mentioning 'refund'"
   * "Search for 'technical support' in dialog"
   * "Find analysis containing 'positive sentiment'"
4. **`search_vcons_semantic`**: Concept-based search
   * "Find conversations where customer was unhappy"
   * "Show me calls about payment issues"
   * "Find similar conversations to this one"
5. **`search_vcons_hybrid`**: Comprehensive search
   * "Find all billing-related conversations" (gets both exact matches and related topics)
   * "Search for customer complaints" (finds variations and synonyms)
   * Best when you want both precision and recall

### Performance Tips

1. **Use filters**: Date ranges and tags can dramatically reduce search scope
2. **Set appropriate limits**: Start with smaller limits (10-20) for faster results
3. **Choose the right tool**: Don't use semantic search if keyword search is sufficient
4. **Pre-generate embeddings**: Semantic search requires embeddings to be generated beforehand

***

## Generating Embeddings

For semantic and hybrid search to work effectively, you need to generate embeddings for your vCons.

See the following guides:

* [INGEST\_AND\_EMBEDDINGS.md](https://github.com/vcon-dev/vcon-mcp/blob/main/docs/development/INGEST_AND_EMBEDDINGS.md) - Complete guide to embedding generation
* [EMBEDDING\_STRATEGY\_UPGRADE.md](https://github.com/vcon-dev/vcon-mcp/blob/main/docs/development/EMBEDDING_STRATEGY_UPGRADE.md) - Details on which content is embedded

**Quick start:**

```bash
# Generate embeddings continuously
npm run sync:embeddings

# Or as part of full sync
npm run sync

# Check embedding coverage
npm run embeddings:check
```

***

## Troubleshooting

### "No results found" for content search

* Check that the content exists in dialog or analysis
* Try a simpler query (fewer words)
* Use wildcards or partial words
* Check date range filters

### "Embedding generation not yet implemented"

* Semantic search currently requires pre-computed embeddings
* Use `search_vcons_content` for keyword search instead
* Generate embeddings using the scripts in `/scripts/`

### "Embedding must be 384 dimensions"

* The system uses 384-dimensional embeddings
* If you're providing embeddings, ensure they match this dimension
* Use `text-embedding-3-small` with `dimensions=384` (OpenAI)
* Or use `sentence-transformers/all-MiniLM-L6-v2` (Hugging Face)

### Poor search results

* For keyword search: Try simpler, more specific terms
* For semantic search: Ensure embeddings are up to date
* For hybrid search: Adjust `semantic_weight` parameter
* Consider using tags to filter results

***

## Examples

### Find customer complaints in dialog

```json
{
  "query": "customer complaint angry upset frustrated",
  "limit": 20
}
```

### Find high-priority sales conversations

```json
{
  "query": "pricing quote proposal",
  "tags": {
    "department": "sales",
    "priority": "high"
  },
  "start_date": "2024-01-01T00:00:00Z"
}
```

### Hybrid search with keyword emphasis

```json
{
  "query": "billing invoice payment",
  "semantic_weight": 0.3,
  "limit": 30
}
```

### Find conversations similar to a specific vCon

1. Get the vCon's embedding from the database
2. Use it in semantic search:

```json
{
  "embedding": [0.123, 0.456, ...],  // 384 dimensions
  "threshold": 0.75,
  "limit": 10
}
```

***

## Related Documentation

* [Getting Started](/guide/getting-started.md) - Getting started with vCon MCP
* [Ingest and Embeddings](https://github.com/vcon-dev/vcon-mcp/blob/main/docs/development/INGEST_AND_EMBEDDINGS.md) - Embedding generation
* [Search Optimization Guide](https://github.com/vcon-dev/vcon-mcp/blob/main/docs/SEARCH_OPTIMIZATION_GUIDE.md) - Database search performance


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://mcp.conserver.io/guide/search.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
