03 · ENGINE

Compression & encodings

XERJ uses two layers of compression. Column encodings are chosen per-field at write time based on the column's shape. Block compression runs over the encoded column data. The two together account for the 2.8× disk win over Elasticsearch on SIEM workloads.

Block compression

Three levels, controlled by [compression] level:

block_size_docs (16-4096, default 128) controls the compression block size.

Column encodings

Picked automatically at ingest time from this set:

BitsetEnum≤16 unique values · 4-bit index + bitmaps. Used for level, status_class.
Dictionary≤256 unique strings. Used for service, host, method.
DeltaTimestampApache / Nginx / ISO detected → µs deltas-of-deltas. Used for @timestamp.
PackedIpIPv4 as u32, IPv6 as u128.
UrlTemplatePath normalization. /users/:id collapses high-cardinality URLs into templates.
VarintSmall integers. Used for status, bytes.
BitpackedBooleans · 8 per byte.
FixedPrecisionFloats as scaled varints. Used for percentiles, durations.
RawStringFallback · high cardinality. Used for message, user_agent.

Source · engine/crates/compress/src/field_codec.rs