Skip to main content

Tiering Service Deep Dive Part 3: In Production

Giannis Polyzos
PPMC member of Apache Fluss (Incubating)

Banner

Part 1 and Part 2 built up everything you need to know about how tiering behaves: the mental model, the dials, the queue dynamics, the scale-out story. This part is about what to do with all of that. What breaks at runtime, and which of those failures self-heal versus need operator action. The design mistakes that look fine on day one but come back to bite you on day two. And the operator's daily view: which five numbers tell you whether tiering is healthy on a Tuesday afternoon, where each one comes from, and why two of them can only come from your Flink-side dashboards.

Tiering Service Deep Dive, 3-parts:

  • Part 1 - The Mental Model: how one tiering round actually works, from timer fire to lake commit.
  • Part 2 - Tuning: per-table dials, multi-table dynamics, and scaling out.
  • Part 3 - In Production: failure modes, design pitfalls, and the dashboard that tells you everything is fine.

Tiering Service Deep Dive Part 2: Tuning

Giannis Polyzos
PPMC member of Apache Fluss (Incubating)

Banner

Part 1 built the mental model. What tiering is, who does what, how the round runs end-to-end.

This part adds the dials. Buckets and splits determine how a round parallelizes. Log and PK tables behave so differently on round one that the difference deserves its own treatment. The freshness setting, the one knob most users actually touch, does two different jobs that share the same value. Once a single job is handling many tables, queue position starts to dominate effective freshness more than any per-table setting. And once that happens, you have a deployment-shape decision: stay with one job, or scale out. By the end, you'll know which levers matter most and how to use them.

Tiering Service Deep Dive, 3-parts:

  • Part 1 - The Mental Model: how one tiering round actually works, from timer fire to lake commit.
  • Part 2 - Tuning: per-table dials, multi-table dynamics, and scaling out.
  • Part 3 - In Production: failure modes, design pitfalls, and monitoring.

Tiering Service Deep Dive Part 1: The Mental Model

Giannis Polyzos
PPMC member of Apache Fluss (Incubating)

Banner

If you're new to Fluss, the lake-tiering story is one of those topics where every explanation seems to assume you already know how it works. This three-part walkthrough aims to bring some clarity to the confusing parts of the system, and to help you understand how it works in practice.

Part 1 builds the mental model from scratch and by the end of it you'll be able to describe, step by step, what happens between the moment a tiering timer fires and the moment a lake snapshot is committed.

Part 2 and Part 3 take that mental model and add the dials (parallelism, table kinds, freshness, multi-table behavior, scale-out) and then put it into a real production deployment (failures, pitfalls, monitoring).

Tiering Service Deep Dive, 3-parts:

  • Part 1 - The Mental Model: how one tiering round actually works, from timer fire to lake commit.
  • Part 2 - Tuning: per-table dials, multi-table dynamics, and scaling out.
  • Part 3 - In Production: failure modes, design pitfalls, and monitoring.

The Storage Hierarchy: Hot, Remote, and Lake

Giannis Polyzos
PPMC member of Apache Fluss (Incubating)

Banner

Apache Fluss stores data in three places: local disk on the tablet server, remote object storage like S3, and the lakehouse. Which place holds which data at any given moment, and what is responsible for moving it between them, is the foundation everything else rests on. Your capacity plan depends on it. Your latency targets depend on it. Your disaster-recovery story depends on it. So does your ability to predict, in advance, that a particular configuration change is going to fill up local disk a week later.

How Apache Fluss Achieves True Pruning in Streaming Storage

Yunhong Zheng
PPMC member of Apache Fluss (Incubating)

Banner

TL;DR:

Apache Kafka's "column pruning" is actually pseudo-pruning. All fields still cross the network, and clients discard unwanted ones after the fact. Apache Fluss redesigns the storage format, server-side read path, and write-side batching strategy from the ground up with Arrow IPC columnar storage, zero-copy server-side pruning, and client-side pre-shuffle batching. The result: pruning 90% of columns yields a 10x read throughput improvement, with performance scaling linearly with the pruning ratio.

Taobao Instant Commerce: Real-Time Decisions at Scale with Apache Fluss

Howie Wang
Data Engineering Expert of Taobao Instant Commerce

Every autumn in China, social media floods with posts about "The First Cup of Milk Tea in Autumn." With a tap on their phone, consumers expect their order delivered within 30 minutes. That effortless experience is no accident: it is the result of Taobao Instant Commerce making thousands of data-driven decisions every second.

Taobao Instant Commerce has scaled from a single-category food delivery service into a high-frequency platform spanning fresh produce, consumer electronics (3C), and beauty products. It operates under two very different modes: steady high-frequency daily transactions, and explosive traffic surges during promotional events where order volumes can multiply within minutes. Both demand the same thing: real-time responsiveness across hundreds of millions of SKUs.

Real-time is not a nice-to-have here; it is the lifeline for three critical functions:

  • Operations: Refresh conversion rates and funnels within 30 seconds.
  • Algorithms: Order prediction models must iterate at minute-level granularity.
  • Quality Assurance: Canary release anomalies must be detected within seconds and trigger instant alerts.

The existing pipeline (built on Kafka, Flink, Paimon, and StarRocks) handled this at one scale.

Note: In Alibaba's internal infrastructure, TT (TimeTunnel) is the internal equivalent of Apache Kafka — a high-throughput distributed message queue. Throughout this post, "Kafka" refers to TT in the Taobao Instant Commerce context. But as the business grew, three fundamental bottlenecks emerged: unbounded state growth from stream joins, mounting complexity in building multi-stream denormalized tables, and excessive resource consumption from lakehouse synchronization. Together they formed an impossible triangle: no matter how the team tuned the system, latency, consistency, and cost could not all be optimized at once.

Fluss broke this impasse. By replacing the fragmented stream-batch architecture with a unified storage layer, its features (Delta Join, Partial Update, Streaming-Lakehouse Unification, Column Pruning, and Auto-Increment Columns) systematically eliminated all three bottlenecks and fundamentally reshaped how Taobao Instant Commerce handles real-time decision-making at scale.

Real-Time Multi-Dimensional Unique Visitor Deduplication in Practice

Yang Wang
Apache Fluss (Incubating) Contributor

UV (Unique Visitors) measures the count of distinct users who visited a page or triggered an event within a given time window — unlike PV (Page Views), which counts every request regardless of who made it. For any product or platform, accurate real-time UV statistics across dimensions like channel, city, date, and hour are a core analytical requirement. The full combination of four dimensions means 16 grouping methods; when the dimension count increases to seven, the number of possible groupings reaches 128.

How can multi-dimensional deduplication be both accurate and flexible while maintaining real-time performance? Behind this challenge lie two very different computing paradigms: direct deduplication of raw data, or set operations based on bitmaps.

Why Apache Fluss Chose Rust for Its Multi-Language SDK

Luo Yuxia
PPMC member of Apache Fluss (Incubating)
Keith Lee
Apache Fluss (Incubating) Committer
Anton Borisov
Contributor of Apache Fluss (Incubating)

Banner

If you maintain a data system that only speaks Java, you will eventually hear from someone who doesn't. A Python team building a feature store. A C++ service that needs sub-millisecond writes. An AI agent that wants to call your system through a tool binding. They all need the same capabilities (writes, reads, lookups) and none of them want to spin up a JVM to get them.

Apache Fluss, streaming storage for real-time analytics and AI, hit this exact inflection point. The Java client works well for Flink-based compute, where the JVM is already the world you live in. But outside that world, asking consumers to run a JVM sidecar just to write a record or look up a key creates friction that compounds across every service, every pipeline, every agent in the stack.

We could have written a separate client for each language. Maintain five copies of the wire protocol, five implementations of the batching logic, five sets of retry semantics and idempotence tracking. That path scales linearly with languages and ends predictably: the Java client gets features first, the Python client gets them six months later with slightly different edge-case behavior, and the C++ client is perpetually "almost done."

We took a different path and tried to leverage the lessons of the great.

Announcing Apache Fluss (Incubating) Rust, Python, and C++ Client 0.1.0 Release

Luo Yuxia
PPMC member of Apache Fluss (Incubating)
Keith Lee
Apache Fluss (Incubating) Committer
Anton Borisov
Contributor of Apache Fluss (Incubating)

Banner

We are excited to announce the release of fluss-rust clients 0.1.0, the first official release of the Rust, Python, and C++ clients for Apache Fluss. This 0.1.0 release represents the culmination of 210+ commits from the community, delivering a feature-rich multi-language client from the ground up.

Under the hood, all three clients share a single Rust core that handles protocol negotiation, batching, retries, and Apache Arrow-based data exchange, with thin language-specific bindings on top. This was a deliberate community decision to deliver native performance and feature parity across every language from day one.

What does Apache Fluss mean in the context of AI?

Giannis Polyzos
PPMC member of Apache Fluss (Incubating)

The Data Foundation for Real-Time Intelligent Systems

Apache Fluss (Incubating) started as streaming storage for real-time analytics, built to work closely with stream processors like Apache Flink. Its focus has always been on freshness, efficient analytical access, and continuous data, making fast-changing streams directly usable without forcing them through batch-oriented systems or log-only pipelines.

Over the last year, Fluss has expanded beyond this original framing. You’ll now see it described as streaming storage for real-time analytics and AI. This change reflects how data systems are being used today: more workloads depend on continuously updated data, low-latency access to evolving state, and the ability to reason over context as it changes.

In this context, “AI” does not mean training or serving models inside Fluss. It refers to the class of intelligent systems that rely on fresh features, evolving context, and real-time state to make decisions continuously. Whether those systems use traditional machine learning models, newer AI techniques, or a combination of both, they all depend on the same data foundations.

This shift explains the recent evolution of Apache Fluss. Investments in stateless compute, richer data types with zero-copy schema evolution, and vector support through Lance were driven by a single question:

What does a data foundation need to look like to support real-time intelligent systems reliably at scale?

The rest of this post answers that question. We’ll explain what AI means when viewed through the lens of Apache Fluss, and why a streaming-first foundation for features, context, and state is central to building the next generation of intelligent systems.