How to Ensure Zero Data Loss in Apache Kafka

In financial services, healthcare records, and audit log processing, losing even a single message is unacceptable. By default, Apache Kafka is optimized for a balance of speed and reliability. However, with the right configurations across producers, brokers, and consumers, you can configure Kafka for **zero data loss** (absolute durability).

In this guide, we provide a concrete checklist of properties you must set to guarantee that messages are never lost in transit.

Zero data loss configuration flow diagram in Kafka

Real-World Analogy: High-Security Bank Transport

Imagine a bank transferring physical cash between vault facilities:

Instead of sending one truck, the bank uses **three armored trucks carrying duplicate boxes (Replication Factor = 3)**. The bank dispatcher demands that at least **two trucks must arrive safely and co-sign the receipt (min.insync.replicas = 2)** before marking the transfer as successful (acks = all).

If a truck gets a flat tire, they do not elect a random bicycle driver to lead the remaining trucks (unclean.leader.election = false). Finally, the receiving vault clerk only logs the deposit in the main ledger after the money is physically counted and stored in the safe (manual offset commits).

Zero Data Loss Configuration Checklist

To guarantee messages are never lost at any stage of their journey, you must coordinate settings across all three parts of the pipeline:

1. Producer Configuration

Producers must wait for full acknowledgment and retry indefinitely in case of transient network errors:

acks=all: The producer waits for the leader broker and all current in-sync followers to write the message before receiving success.
enable.idempotence=true: Ensures retried messages do not write duplicates. This sets retries=MAX_INT and max.in.flight.requests.per.connection=5 automatically.

# Producer properties for data durability
acks=all
enable.idempotence=true

2. Broker / Topic Configuration

The cluster must have sufficient redundancy to survive server crashes:

replication.factor=3: Ensures there are three independent copies of the partition log across three brokers.
min.insync.replicas=2: Enforces that at least two replicas (the leader and one follower) must confirm the write. If only the leader is online, the broker rejects the write, preventing data loss if that leader crashes.
unclean.leader.election.enable=false: Prevents out-of-sync replicas from ever being elected as partition leaders, avoiding silent data loss at the cost of partition availability.

3. Consumer Configuration

Consumers must only commit offsets after records have been processed successfully:

enable.auto.commit=false: Disable automatic time-based offset commits.
Process-then-Commit pattern: Write your code to process the record (e.g., store in database, invoke third-party service) *first*, and then call commitSync() or commitAsync().

// Zero-loss manual offset commit loop
while (true) {
    ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(100));
    for (ConsumerRecord<String, String> record : records) {
        processBusinessLogic(record); // Process first
    }
    consumer.commitSync(); // Commit offset only after successful processing
}

Summary Table of Delivery Guarantees

Setting	Value	Durability Benefit
acks	all	Confirms data is safely on multiple brokers.
min.insync.replicas	2	Prevents write if too few replicas are alive.
unclean.leader.election	false	Never elects out-of-sync followers.
enable.auto.commit	false	Consumer manually controls the offset commit point.

Conclusion

Achieving zero data loss is not about a single magic configuration, but a coordinated design. By matching acks=all and min.insync.replicas=2 with replication factors of at least 3, and manually committing offsets post-processing, you build a resilient messaging architecture suitable for mission-critical enterprise systems.