How Apache Fluss Achieves True Pruning in Streaming Storage

TL;DR:
Apache Kafka's "column pruning" is actually pseudo-pruning. All fields still cross the network, and clients discard unwanted ones after the fact. Apache Fluss redesigns the storage format, server-side read path, and write-side batching strategy from the ground up with Arrow IPC columnar storage, zero-copy server-side pruning, and client-side pre-shuffle batching. The result: pruning 90% of columns yields a 10x read throughput improvement, with performance scaling linearly with the pruning ratio.