PLAYBOOK · 03
Playbook · Vector & RAG
Index 1B vectors on one box with HNSW, query with hybrid BM25 + kNN fusion, and use the dashboards to watch recall, latency, and cache hit rate live.
Schema
$ curl -sX PUT http://localhost:8080/v1/indices/docs \
-H 'Content-Type: application/json' \
-d '{
"fields": {
"@timestamp": "date",
"doc_id": "keyword",
"chunk_id": "keyword",
"text": "text",
"title": "text",
"tags": "keyword",
"embedding": { "type": "dense_vector", "dims": 1536, "metric": "cosine" }
}
}'
Ingest with embeddings
Configure an embedding endpoint in [embedding] and XERJ will call it inline during turbo-ingest:
[embedding] default_endpoint = "https://api.openai.com/v1/embeddings" default_model = "text-embedding-3-small" batch_size = 64 timeout_ms = 5000
Or generate vectors client-side and pass them in the NDJSON body directly.
Pure kNN retrieve
{
"knn": {
"field": "embedding",
"query_vector": [0.12, 0.08, -0.31, ...],
"k": 20,
"num_candidates": 200
}
}
Hybrid · BM25 + kNN fusion
One request, one planner pass, one round trip.
{
"hybrid": {
"fusion": "rrf",
"queries": [
{ "match": { "text": "kernel panic after kernel 6.1 upgrade" } },
{ "knn": { "field": "embedding", "query_vector": [...], "k": 50 } }
]
},
"size": 10
}
Semantic search · embed at query time
Skip the client-side embedding step. XERJ embeds the query using the configured endpoint.
{
"semantic_search": {
"field": "embedding",
"text": "how do I rotate api keys without downtime"
}
}
Dashboards
The playground has two RAG-specific views:
- VECTOR · INDEX — resident set size, HNSW graph stats, quantization savings, recall at k vs ef_search.
- RAG · QUALITY — hit rate by intent, cache hit, top retrieved documents.
Source · engine/crates/vector/src/hnsw.rs
◀ PREVLog analytics
NEXT ▶Full-text search