Apache Fluss™

Architecture

Unlocking the Streamhouse Architecture

The multiple-systems tax

Five systems, four integrations, continuous engineering tax.

A conventional real-time AI stack stitches together a message broker, a stream processor, an online store, an offline store, and a synchronization layer. Every boundary is an integration point where data silently diverges. Apache Fluss collapses that stack into one substrate.

Before · fragmented stack

Message broker

Kafka, for event transport.

Stream processor

Flink or Spark, for derived features and aggregations.

Online store

Redis or DynamoDB, for sub-millisecond lookup.

Offline store

Iceberg or Parquet on S3, for training and history.

Sync layer

Bespoke pipelines and freshness monitors that drift silently.

5 Systems · 4 Sync Boundaries · Continuous Engineering Tax

After · unified substrate

Apache Fluss

One columnar streaming store designed for the real-time AI data plane.

Streaming Log · Durable, replayable, offset-ordered streams
PK Lookup · Sub-millisecond key/value serving
Streamhouse · Real-time data layer for Lakehouse architecture
State Store · Externalized state for joins and aggregations
Multi-Modal · Lance integration for vectors and ML context
Audit Trail · Change data feed, replayable by design

1 Substrate · 0 Sync Boundaries · Single Source of Truth

Six capability pillars

The benefits, grounded in the architecture.

Each pillar is a direct consequence of a specific architectural mechanism. Together they collapse the fragmented real-time stack into a single coherent foundation.

Unified Architecture

One system for messaging, applications, analytics, and AI.

Replaces the message queue, key-value store, and OLAP engine with a single platform serving transport, lookups, and queries from the same data.

Architectural basisDual representation of PK Tables (Log Store & KV Store).

Stream & Lakehouse Unification

One copy of data across real-time and batch layers.

Hot and cold tiers share the same schema and are queryable as one substrate, so streaming and historical reads hit one source of truth.

Architectural basisTiering Service and Union Read across Iceberg, Paimon, and Lance.

Compute / Storage Separation

Lean, elastic, stateless compute with fast recovery.

Stateless compute recovers in seconds and runs up to 85% cheaper than Kafka-based topologies. State lives on the Fluss leader, not Flink slots.

Architectural basisStateless compute model with leader-resident state and KV snapshots.

Columnar Streaming Analytics

Pruning that compounds.

Server-side projection, predicate pushdown, and partition pruning on Arrow-format streams compound into order-of-magnitude I/O and network savings.

Architectural basisARROW log format and the compound pruning stack on the TabletServer.

Feature & Context Stores

Multi-modal data on one substrate, ready for ML and AI.

Row, columnar, and vector data on one store. Online features, RAG context, and analytics collapse into one PK Table accessed through different views.

Architectural basisUnified substrate spanning structured features and vector context.

Ecosystem Openness

Open formats. No vendor lock-in.

Readable by Flink, Spark, Trino, StarRocks, and DuckDB. Native hot tier plus Iceberg, Paimon, and Lance for the cold tier, open formats end to end.

Architectural basisOpen lake formats throughout, governed at the Apache Software Foundation.

Apache Fluss vs Apache Kafka

Where Streams Meet The Lakehouse

Kafka is the streaming transport. Fluss is the streaming storage. If your need is large-scale stream processing with Flink, real-time analytics, AI/ML, or a sub-second lakehouse, Fluss is the shared streaming storage substrate behind all of them. Read the full breakdown to see which fits your stack.

See the full comparison

Community

Built in the open, governed by the ASF.

Apache Fluss is developed openly by a global community of contributors. Join the discussion, file an issue, or send a patch.

Apache 2.0

Open-source license

ASF

Apache Software Foundation governance

GitHub

Source code, issues, and pull requests.

Open repository →

Slack

Real-time chat with users and committers.

Join the workspace →

Contribute

Welcome guide, mailing lists, and how to send your first patch.

Get started →