If you are writing a Java application to read messages from Apache Kafka, you have two primary options: write a standard Kafka Consumer or use the Kafka Streams API library.

While both read data from Kafka topics, they are built for entirely different complexity profiles. Let's compare them simply to see when you should write a basic consumer versus building a stateful streams pipeline.

Kafka Streams Processing Architecture
Real-World Analogy: Courier vs. Sorting Department

Imagine managing cargo packages:

  • A Kafka Consumer is like a courier driver. Their job is simple: pick up a package from point A, put it in the truck, and drop it off at point B. They don't open the package, don't aggregate items, and don't care about what was shipped yesterday.
  • Kafka Streams is like an advanced sorting facility. When packages arrive, workers unpack them, filter out trash, group packages by size, calculate the total weight, and store details in a log file (State Store). They cross-reference items with customer database sheets (Joins) and pack the finished bundles onto a new outbound conveyor belt (Sink Topic).

Detailed Comparison

Aspect Kafka Consumer Kafka Streams API
Complexity Low (trivially simple read loops). Medium (requires understanding topologies and states).
Processing Model Stateless (processes one record at a time). Stateful (tracks aggregations, counts, windows).
State Store None (must write to external databases). Built-in local RocksDB storage.
Operations Read → Run custom code → Commit. Fluent API (map, filter, join, window, aggregate).

When Should You Use a Kafka Consumer?

A standard consumer is best when your processing logic is stateless and simple. Choose a consumer if:

  • You just want to read messages and insert them directly into a database (e.g., PostgreSQL, Elasticsearch, MongoDB).
  • You are triggering external third-party API actions (e.g., sending an email or SMS notification for each message).
  • You are writing applications in non-JVM languages (Python, Go, Node.js) where full Streams libraries are not as mature.

When Should You Use Kafka Streams?

Kafka Streams is a client library built specifically for processing streams of data. Choose Kafka Streams if:

  • You need to perform stateful operations like counting clicks per hour or summing transactions per customer.
  • You need to join two data streams together (e.g., merging an Orders stream with a Payments stream).
  • You want to use KStream (event stream) and KTable (database mirror table) abstractions natively.
  • You require built-in Exactly-Once processing guarantees.

Basic Kafka Streams Java Example

Here is how to create a stream processing topology that filters and count records programmatically:

import org.apache.kafka.common.serialization.Serdes;
import org.apache.kafka.streams.*;
import org.apache.kafka.streams.kstream.*;

import java.util.Properties;

Properties config = new Properties();
config.put(StreamsConfig.APPLICATION_ID_CONFIG, "word-count-app");
config.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
config.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass().getName());
config.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, Serdes.String().getClass().getName());

StreamsBuilder builder = new StreamsBuilder();

// Read from source topic "orders"
KStream<String, String> orders = builder.stream("orders");

// Filter and map values: only keep "completed" orders
KStream<String, String> completedOrders = orders.filter(
    (key, value) -> value.contains("STATUS:COMPLETED")
);

// Write results to outbound sink topic
completedOrders.to("completed-orders-destination");

KafkaStreams streams = new KafkaStreams(builder.build(), config);
streams.start();

Conclusion

Start with a lightweight **Kafka Consumer** for simple, stateless ingestion tasks. As soon as you need to aggregate data, join topics, or track time-based windows, upgrade to the **Kafka Streams API** to leverage its built-in RocksDB state stores and robust streaming abstractions.