02 · ENGINE

Storage & WAL

XERJ writes one WAL per index and a list of immutable segments. Each segment is three files, not a segment directory with twelve — a data file, a skip index, and a doc-id sidecar. All three are mmap'd, so reads come straight out of the OS page cache with no application-side buffer.

data/
├── logs/                      · an index
│   ├── schema.json            · field mapping
│   ├── wal/
│   │   ├── wal-000001         · append-only
│   │   └── wal-000002         · rolls at wal_max_size_mb
│   ├── seg-000001/
│   │   ├── segment.seg        · columnar data, mmap'd
│   │   ├── segment.sidx       · skip index for seeks
│   │   └── segment.ids        · doc-id sidecar (external id ↔ internal ordinal)
│   └── seg-000002/
│       ├── segment.seg
│       ├── segment.sidx
│       └── segment.ids
├── traces/
│   └── ...
└── cluster/                   · Raft metadata, only present in clustered mode
    ├── raft-log-*
    └── snapshots/

WAL

Append-only per index. Generation-rotated at wal_max_size_mb (default 512 MiB). Retained until the flush checkpoint passes the tail generation, then the old file is released. Fsync policy is controlled by [storage] wal_sync:

Segments

Three files per segment:

Segments are immutable once written. Updates and deletes work by writing a new segment and a tombstone; merges rewrite surviving documents into a larger segment.

Merges

[merge] strategy picks between size_tiered (default) and log_structured (LSMT-style levelled). min_segments sets the trigger (default 10). io_rate_mb_per_sec throttles the merger to leave headroom for queries (default 100 MiB/s; set 0 to disable). max_concurrent caps parallel merge workers (default 1 — bump to 2–4 on fast NVMe).

Cluster metadata

In single-node mode there is no cluster/ directory — everything the engine needs is right next to the index data. When the server is started with a cluster config, a sibling cluster/ directory holds the embedded Raft log and snapshots. Index data is never in Raft — only the metadata (index schemas, shard assignments, node roster). See Clustering.

Source · engine/crates/storage/src/segment.rs · engine/crates/storage/src/lib.rs