Skip to main content
Version: 0.9 🚧

Spark Connector Options

This page lists all the available options for the Fluss Spark connector.

Read Options

The following Spark configurations can be used to control read behavior for both batch and streaming reads. These options are set using SET in Spark SQL or via spark.conf.set() in Spark applications. All options are prefixed with spark.sql.fluss..

OptionDefaultDescription
spark.sql.fluss.scan.startup.modefullThe startup mode when reading a Fluss table. Supported values:
  • full (default): For primary key tables, reads the full snapshot and merges with log changes. For log tables, reads from the earliest offset.
  • earliest: Reads from the earliest log/changelog offset.
  • latest: Reads from the latest log/changelog offset.
  • timestamp: Reads from a specified timestamp (requires scan.startup.timestamp).
Note: For Structured Streaming read, only latest mode is currently supported.
spark.sql.fluss.read.optimizedfalseIf true, Spark will only read data from the data lake snapshot or KV snapshot, without merging log changes. This can improve read performance but may return stale data for primary key tables.
spark.sql.fluss.scan.poll.timeout10000msThe timeout for the log scanner to poll records.