What is a State Store in Kafka Streams?

In stream processing, some actions are stateless, like converting string cases or filtering records. However, many production operations are **stateful**. If you need to count how many orders a user placed today, join two streams, or calculate a running average, your application must remember past data.

In standard architectures, you would query an external database (like Redis or MySQL) for every event. In Kafka Streams, you use **Local State Stores** to process stateful data at memory speeds.

Kafka Streams State Store and Changelog Architecture

Real-World Analogy: The Chef's Order Sheet

Imagine a chef cooking in a busy kitchen:

If they had to walk to the back-office filing cabinet database every time they added a pinch of salt to check the customer's request history, they would cook very slowly.

Instead, they stick a **magnetic notepad (State Store)** on the refrigerator next to the stove. They quickly update and read order tallies from this notepad in milliseconds. If the kitchen catches fire (container crash), the manager recreates the notepad by reading a duplicate ledger kept at the front desk (Changelog Topic).

How State Stores Work

A State Store is a local database embedded directly inside the running process of your Kafka Streams application task. It provides two main benefits:

Performance: Because the database is local, read and write times are measured in microseconds rather than network milliseconds.
Decoupling: You don't have to manage or scale external database clusters to handle stream metrics.

RocksDB: The Default Engine

By default, Kafka Streams uses **RocksDB** (an embedded, high-performance key-value store developed by Facebook) to write local state files to disk. You can also configure in-memory state stores if you have a small dataset and can afford to lose it on container restarts.

Fault Tolerance: Changelog Topics

Since local disks can crash, how does Kafka Streams protect the state store?

Whenever your application writes to a local state store, the Kafka Streams client automatically writes a duplicate change record to an internal Kafka topic called a **Changelog Topic**. If your application container crashes, a new container boots up, reads the changelog topic from the beginning, and recreates the exact state store on its new local disk.

Creating a Custom State Store in Java

While operations like count() create state stores automatically, you can declare a custom store programmatically:

import org.apache.kafka.common.serialization.Serdes;
import org.apache.kafka.streams.state.KeyValueStore;
import org.apache.kafka.streams.state.StoreBuilder;
import org.apache.kafka.streams.state.Stores;

// Build a persistent RocksDB key-value store named "user-balance-store"
StoreBuilder<KeyValueStore<String, Long>> balanceStoreBuilder =
    Stores.keyValueStoreBuilder(
        Stores.persistentKeyValueStore("user-balance-store"),
        Serdes.String(),
        Serdes.Long()
    );

// Register the store in your Streams topology
builder.addStateStore(balanceStoreBuilder);

Conclusion

State stores allow Kafka Streams applications to be stateful yet highly resilient. By utilizing local RocksDB files and backing them up to Kafka changelogs, your applications process complex joins and aggregations at scale with minimal latency.