Skip to main content
Open Source · Apache 2.0

Streaming Storage for Real-Time Analytics & AI

Apache Fluss (Incubating) is an open-source, lakehouse-native streaming storage. It collapses the message broker, online KV store, stream-processing state backend, and lakehouse cold store into a single coherent foundation, making the Lakehouse truly real-time.

-- Register Apache Fluss as a Flink catalog
CREATE CATALOG fluss_catalog WITH (
'type' = 'fluss',
'bootstrap.servers' = 'coordinator-server:9123'
);
USE CATALOG fluss_catalog;
-- Create a primary-key table
CREATE TABLE pk_table (
shop_id BIGINT,
user_id BIGINT,
num_orders INT,
PRIMARY KEY (shop_id, user_id) NOT ENFORCED
) WITH ('bucket.num' = '4');
INSERT INTO pk_table VALUES (1234, 1234, 1);
SELECT * FROM pk_table WHERE shop_id = 1234;
Architecture

Unlocking the Streamhouse Architecture

Apache Fluss architectureSources on the left (databases, CDC streams, event logs, IoT/clickstreams) feed the Fluss hot tier in the centre, which is composed of a Coordinator Server and a row of Tablet Servers. Data continuously tiers down to a Lakehouse cold tier (Apache Paimon, Apache Iceberg, Lance) via a Tiering Service. Read patterns on the right include streaming reads, batch reads, lookup joins, and a union read that merges hot and cold. Query engines along the bottom include Apache Flink, Apache Spark, Trino, StarRocks, DuckDB, and Ray.01 · SOURCES02 · APACHE FLUSS · HOT TIER03 · READ PATTERNSCDC StreamsPostgres · MySQLOracle · MongoDBEvent StreamsDevice · WebMobileAI WorkloadsFeatures · EmbeddingsMultimodal · AgentsApache Flink / SparkApache Fluss ClientsAPACHE FLUSS · HOT TIERSub-second freshness · Columnar log · Changelog streamCoordinator ServerMetadata · Placement · FailoverTablet ServerNode 01Log TablePK TableTablet ServerNode 02Log TablePK TableTablet ServerNode 03Log TablePK TableTablet ServerNode NLog TablePK TableTiering ServiceFlink Job · Continuous CompactionLAKEHOUSE · COLD TIEROpen formats · Long retention · Query-engine nativeApache PaimonApache IcebergLanceStreaming ReadsChangelog stream · IncrementalBatch ReadsSnapshot scan · Time travelLookup JoinKey/Value Lookups · PK TablesUnion ReadHot & Cold Data · Single query04 · QUERY ENGINESApache FlinkApache SparkTrinoStarRocksDuckDB
The multiple-systems tax

Five systems, four integrations, continuous engineering tax.

A conventional real-time AI stack stitches together a message broker, a stream processor, an online store, an offline store, and a synchronization layer. Every boundary is an integration point where data silently diverges. Apache Fluss collapses that stack into one substrate.

Before · fragmented stack
Message broker
Kafka, for event transport.
Stream processor
Flink or Spark, for derived features and aggregations.
Online store
Redis or DynamoDB, for sub-millisecond lookup.
Offline store
Iceberg or Parquet on S3, for training and history.
Sync layer
Bespoke pipelines and freshness monitors that drift silently.

5 Systems · 4 Sync Boundaries · Continuous Engineering Tax

After · unified substrate
Apache Fluss
One columnar streaming store designed for the real-time AI data plane.
  • Streaming Log · Durable, replayable, offset-ordered streams
  • PK Lookup · Sub-millisecond key/value serving
  • Streamhouse · Real-time data layer for Lakehouse architecture
  • State Store · Externalized state for joins and aggregations
  • Multi-Modal · Lance integration for vectors and ML context
  • Audit Trail · Change data feed, replayable by design

1 Substrate · 0 Sync Boundaries · Single Source of Truth

Six capability pillars

The benefits, grounded in the architecture.

Each pillar is a direct consequence of a specific architectural mechanism. Together they collapse the fragmented real-time stack into a single coherent foundation.

Unified Architecture

One system for messaging, applications, analytics, and AI.

Replaces the message queue, key-value store, and OLAP engine with a single platform serving transport, lookups, and queries from the same data.

Architectural basisDual representation of PK Tables (Log Store & KV Store).

Stream & Lakehouse Unification

One copy of data across real-time and batch layers.

Hot and cold tiers share the same schema and are queryable as one substrate, so streaming and historical reads hit one source of truth.

Architectural basisTiering Service and Union Read across Iceberg, Paimon, and Lance.

Compute / Storage Separation

Lean, elastic, stateless compute with fast recovery.

Stateless compute recovers in seconds and runs up to 85% cheaper than Kafka-based topologies. State lives on the Fluss leader, not Flink slots.

Architectural basisStateless compute model with leader-resident state and KV snapshots.

Columnar Streaming Analytics

Pruning that compounds.

Server-side projection, predicate pushdown, and partition pruning on Arrow-format streams compound into order-of-magnitude I/O and network savings.

Architectural basisARROW log format and the compound pruning stack on the TabletServer.

Feature & Context Stores

Multi-modal data on one substrate, ready for ML and AI.

Row, columnar, and vector data on one store. Online features, RAG context, and analytics collapse into one PK Table accessed through different views.

Architectural basisUnified substrate spanning structured features and vector context.

Ecosystem Openness

Open formats. No vendor lock-in.

Readable by Flink, Spark, Trino, StarRocks, and DuckDB. Native hot tier plus Iceberg, Paimon, and Lance for the cold tier, open formats end to end.

Architectural basisOpen lake formats throughout, governed at the Apache Software Foundation.

Apache Fluss vs Apache Kafka

Where Streams Meet The Lakehouse

Kafka is the streaming transport. Fluss is the streaming storage. If your need is large-scale stream processing with Flink, real-time analytics, AI/ML, or a sub-second lakehouse, Fluss is the shared streaming storage substrate behind all of them. Read the full breakdown to see which fits your stack.

Community

Built in the open, governed by the ASF.

Apache Fluss is developed openly by a global community of contributors. Join the discussion, file an issue, or send a patch.

Apache 2.0
Open-source license
ASF
Apache Software Foundation governance