Skip to main content

One post tagged with "streaming"

View All Tags

How Apache Fluss Achieves True Pruning in Streaming Storage

Yunhong Zheng
PPMC member of Apache Fluss (Incubating)

Banner

TL;DR:

Apache Kafka's "column pruning" is actually pseudo-pruning. All fields still cross the network, and clients discard unwanted ones after the fact. Apache Fluss redesigns the storage format, server-side read path, and write-side batching strategy from the ground up with Arrow IPC columnar storage, zero-copy server-side pruning, and client-side pre-shuffle batching. The result: pruning 90% of columns yields a 10x read throughput improvement, with performance scaling linearly with the pruning ratio.