Skip to main content

One post tagged with "column-pruning"

View All Tags

How Apache Fluss Achieves True Pruning in Streaming Storage

Yunhong Zheng
PPMC member of Apache Fluss (Incubating)

Banner

TL;DR:

Apache Kafka's "column pruning" is actually pseudo-pruning. All fields still cross the network, and clients discard unwanted ones after the fact. Apache Fluss redesigns the storage format, server-side read path, and write-side batching strategy from the ground up with Arrow IPC columnar storage, zero-copy server-side pruning, and client-side pre-shuffle batching. The result: pruning 90% of columns yields a 10x read throughput improvement, with performance scaling linearly with the pruning ratio.