Skip to main content

3 posts tagged with "Arrow"

View All Tags

How Apache Fluss Achieves True Pruning in Streaming Storage

Yunhong Zheng
PPMC member of Apache Fluss (Incubating)

Banner

TL;DR:

Apache Kafka's "column pruning" is actually pseudo-pruning. All fields still cross the network, and clients discard unwanted ones after the fact. Apache Fluss redesigns the storage format, server-side read path, and write-side batching strategy from the ground up with Arrow IPC columnar storage, zero-copy server-side pruning, and client-side pre-shuffle batching. The result: pruning 90% of columns yields a 10x read throughput improvement, with performance scaling linearly with the pruning ratio.

Why Apache Fluss Chose Rust for Its Multi-Language SDK

Luo Yuxia
PPMC member of Apache Fluss (Incubating)
Keith Lee
Apache Fluss (Incubating) Committer
Anton Borisov
Contributor of Apache Fluss (Incubating)

Banner

If you maintain a data system that only speaks Java, you will eventually hear from someone who doesn't. A Python team building a feature store. A C++ service that needs sub-millisecond writes. An AI agent that wants to call your system through a tool binding. They all need the same capabilities (writes, reads, lookups) and none of them want to spin up a JVM to get them.

Apache Fluss, streaming storage for real-time analytics and AI, hit this exact inflection point. The Java client works well for Flink-based compute, where the JVM is already the world you live in. But outside that world, asking consumers to run a JVM sidecar just to write a record or look up a key creates friction that compounds across every service, every pipeline, every agent in the stack.

We could have written a separate client for each language. Maintain five copies of the wire protocol, five implementations of the batching logic, five sets of retry semantics and idempotence tracking. That path scales linearly with languages and ends predictably: the Java client gets features first, the Python client gets them six months later with slightly different edge-case behavior, and the C++ client is perpetually "almost done."

We took a different path and tried to leverage the lessons of the great.

Announcing Apache Fluss (Incubating) Rust, Python, and C++ Client 0.1.0 Release

Luo Yuxia
PPMC member of Apache Fluss (Incubating)
Keith Lee
Apache Fluss (Incubating) Committer
Anton Borisov
Contributor of Apache Fluss (Incubating)

Banner

We are excited to announce the release of fluss-rust clients 0.1.0, the first official release of the Rust, Python, and C++ clients for Apache Fluss. This 0.1.0 release represents the culmination of 210+ commits from the community, delivering a feature-rich multi-language client from the ground up.

Under the hood, all three clients share a single Rust core that handles protocol negotiation, batching, retries, and Apache Arrow-based data exchange, with thin language-specific bindings on top. This was a deliberate community decision to deliver native performance and feature parity across every language from day one.