Skip to main content

Fluss Joins the Apache Incubator

Jark Wu
PPMC member of Apache Fluss (Incubating)

On June 5th, Fluss, the next-generation streaming storage project open-sourced and donated by Alibaba, successfully passed the vote and officially became an incubator project of the Apache Software Foundation (ASF). This marks a significant milestone in the development of the Fluss community, symbolizing that the project has entered a new phase that is more open, neutral, and standardized. Moving forward, Fluss will leverage the ASF ecosystem to accelerate the building of a global developer community, continuously driving innovation and adoption of next-generation real-time data infrastructure.

ASF

Apache Fluss Java Client: A Deep Dive

Giannis Polyzos
PPMC member of Apache Fluss (Incubating)

Banner

Introduction

Apache Fluss is a streaming data storage system built for real-time analytics, serving as a low-latency data layer in modern data Lakehouses. It supports sub-second streaming reads and writes, storing data in a columnar format for efficiency, and offers two flexible table types: append-only Log Tables and updatable Primary Key Tables. In practice, this means Fluss can ingest high-throughput event streams (using log tables) while also maintaining up-to-date reference data or state (using primary key tables), a combination ideal for scenarios like IoT, where you might stream sensor readings and look up information for those sensors in real-time, without the need for external K/V stores.

Tiering Service Deep Dive

GUO Yang
Fluss Contributor

Background

At the core of Fluss’s Lakehouse architecture sits the Tiering Service: a smart, policy-driven data pipeline that seamlessly bridges your real-time Fluss cluster and your cost-efficient lakehouse storage. It continuously ingests fresh events from the fluss cluster, automatically migrating older or less-frequently accessed data into colder storage tiers without interrupting ongoing queries. By balancing hot, warm, and cold storage according to configurable rules, the Tiering Service ensures that recent data remains instantly queryable while historical records are archived economically.

In this blog post we will take a deep dive and explore how Fluss’s Tiering Service orchestrates data movement, preserves consistency, and empowers scalable, high-performance analytics at optimized costs.

Announcing Fluss 0.7

Jark Wu
PPMC member of Apache Fluss (Incubating)

Banner

🌊 We are excited to announce the official release of Fluss 0.7!

This version has undergone extensive improvements in stability, architecture, performance optimization, and security, further enhancing its readiness for production environments. Over the past three months, we have completed more than 250 commits, making this release a significant milestone toward becoming a mature, production-grade streaming storage platform.

Understanding Partial Updates

Giannis Polyzos
PPMC member of Apache Fluss (Incubating)

Banner

Traditional streaming data pipelines often need to join many tables or streams on a primary key to create a wide view. For example, imagine you’re building a real-time recommendation engine for an e-commerce platform. To serve highly personalized recommendations, your system needs a complete 360° view of each user, including: user preferences, past purchases, clickstream behavior, cart activity, product reviews, support tickets, ad impressions, and loyalty status.

That’s at least 8 different data sources, each producing updates independently.

The Story of Fluss Logo

Jark Wu
PPMC member of Apache Fluss (Incubating)

Introducing the Little Otter

Today is World Otter Day, and we are thrilled to introduce the little otter to the Fluss community! 🎉

Since open-sourced half a year ago, many community members and friends have asked us: "When will Fluss get a logo?" After more than a month of careful design work and over 30 iterations, we’re excited to finally unveil the official Fluss logo — a surfing otter! 🦦🌊

Announcing Fluss 0.6

Jark Wu
PPMC member of Apache Fluss (Incubating)

The Fluss community is pleased to announce the official release of Fluss 0.6.0. This version has undergone over three months of intensive development, bringing together the expertise and efforts of 45 contributors worldwide, with more than 200 code commits completed. Our heartfelt thanks go out to every contributor for their invaluable support!

Release Announcement

Towards A Unified Streaming & Lakehouse Architecture

Luo Yuxia
PPMC member of Apache Fluss (Incubating)

The unification of Lakehouse and streaming storage represents a major trend in the future development of modern data lakes and streaming storage systems. Designed specifically for real-time analytics, Fluss has embraced a unified Streaming and Lakehouse architecture from its inception, enabling seamless integration into existing Lakehouse architectures.

Fluss is designed to address the demands of real-time analytics with the following key capabilities:

  • Real-Time Stream Reading and Writing: Supports millisecond-level end-to-end latency.
  • Columnar Stream: Optimizes storage and query efficiency.
  • Streaming Updates: Enables low-latency updates to data streams.
  • Changelog Generation: Supports changelog generation and consumption.
  • Real-Time Lookup Queries: Facilitates instant lookup queries on primary keys.
  • Streaming & Lakehouse Unification: Seamlessly integrates streaming and lakehouse storage for unified data processing.

Introducing Fluss: Streaming Storage for Real-Time Analytics

Jark Wu
PPMC member of Apache Fluss (Incubating)

We have discussed the challenges of using Kafka for real-time analytics in our previous blog post. Today, we are excited to introduce Fluss, a cutting-edge streaming storage system designed to power real-time analytics. We are going to explore Fluss's architecture, design principles, key features, and how it addresses the challenges of using Kafka for real-time analytics.

Why Fluss? Top 4 Challenges of Using Kafka for Real-Time Analytics

Jark Wu
PPMC member of Apache Fluss (Incubating)

The industry is undergoing a clear and significant shift as big data computing transitions from offline to real-time processing. This transition is revolutionizing various sectors, including the E-commerce, automotive networking, finance, and beyond, where real-time data applications are becoming integral to operations. This evolution enables organizations to unlock greater value by leveraging real-time insights to drive business impact and enhance decision-making.