PLAYBOOK · 03

Playbook · Vector & RAG

Index 1B vectors on one box with HNSW, query with hybrid BM25 + kNN fusion, and use the dashboards to watch recall, latency, and cache hit rate live.

Schema

$ curl -sX PUT http://localhost:8080/v1/indices/docs \
    -H 'Content-Type: application/json' \
    -d '{
      "fields": {
        "@timestamp": "date",
        "doc_id":     "keyword",
        "chunk_id":   "keyword",
        "text":       "text",
        "title":      "text",
        "tags":       "keyword",
        "embedding":  { "type": "dense_vector", "dims": 1536, "metric": "cosine" }
      }
    }'

Ingest with embeddings

Configure an embedding endpoint in [embedding] and XERJ will call it inline during turbo-ingest:

[embedding]
default_endpoint = "https://api.openai.com/v1/embeddings"
default_model    = "text-embedding-3-small"
batch_size       = 64
timeout_ms       = 5000

Or generate vectors client-side and pass them in the NDJSON body directly.

Pure kNN retrieve

{
  "knn": {
    "field":      "embedding",
    "query_vector": [0.12, 0.08, -0.31, ...],
    "k":          20,
    "num_candidates": 200
  }
}

Hybrid · BM25 + kNN fusion

One request, one planner pass, one round trip.

{
  "hybrid": {
    "fusion": "rrf",
    "queries": [
      { "match": { "text": "kernel panic after kernel 6.1 upgrade" } },
      { "knn":   { "field": "embedding", "query_vector": [...], "k": 50 } }
    ]
  },
  "size": 10
}

Semantic search · embed at query time

Skip the client-side embedding step. XERJ embeds the query using the configured endpoint.

{
  "semantic_search": {
    "field": "embedding",
    "text":  "how do I rotate api keys without downtime"
  }
}

Dashboards

The playground has two RAG-specific views:

VECTOR · INDEX — resident set size, HNSW graph stats, quantization savings, recall at k vs ef_search.
RAG · QUALITY — hit rate by intent, cache hit, top retrieved documents.

Source · engine/crates/vector/src/hnsw.rs

◀ PREVLog analytics

NEXT ▶Full-text search