03 · ENGINE
Compression & encodings
XERJ uses two layers of compression. Column encodings are chosen per-field at write time based on the column's shape. Block compression runs over the encoded column data. The two together account for the 2.8× disk win over Elasticsearch on SIEM workloads.
Block compression
Three levels, controlled by [compression] level:
- fast — LZ4. ~500 MB/s throughput. Use for write-heavy paths.
- balanced — Zstd L3. Default. ~3-4× ratio.
- best — Zstd L19. ~5-6× ratio. Use for cold indices you query rarely.
block_size_docs (16-4096, default 128) controls the compression block size.
Column encodings
Picked automatically at ingest time from this set:
BitsetEnum≤16 unique values · 4-bit index + bitmaps. Used for
level, status_class.Dictionary≤256 unique strings. Used for
service, host, method.DeltaTimestampApache / Nginx / ISO detected → µs deltas-of-deltas. Used for
@timestamp.PackedIpIPv4 as u32, IPv6 as u128.
UrlTemplatePath normalization.
/users/:id collapses high-cardinality URLs into templates.VarintSmall integers. Used for
status, bytes.BitpackedBooleans · 8 per byte.
FixedPrecisionFloats as scaled varints. Used for percentiles, durations.
RawStringFallback · high cardinality. Used for
message, user_agent.Source · engine/crates/compress/src/field_codec.rs
◀ PREVStorage & WAL
NEXT ▶Clustering