CentralMesh.io

Kafka Fundamentals for Beginners
AdSense Banner (728x90)

5.5 Message Guarantees

At-least-once, at-most-once, and exactly-once delivery semantics.

Video Coming Soon

Message Guarantees

Overview

Kafka provides different message delivery guarantees to balance reliability, performance, and data consistency. These guarantees determine how messages are delivered and the trade-offs between duplication, data loss, and speed.

At-Least-Once Delivery

Default Guarantee

  • Every message sent by producer is guaranteed to be delivered at least once
  • Possibility of duplicates exists
  • Kafka's default delivery guarantee

Why Duplicates Occur

  • Producer doesn't receive acknowledgment from broker
  • Producer assumes message was lost and retries
  • Original message might have already been written successfully
  • Results in duplicate messages in topic

Producer Configuration

java
1props.put(ProducerConfig.ACKS_CONFIG, "all");
2props.put(ProducerConfig.RETRIES_CONFIG, "3");
3
4KafkaProducer<String, String> producer = new KafkaProducer<>(props);

Configuration Details:

  • acks=all: Leader waits for all in-sync replicas to confirm before acknowledging
  • retries=3: Producer retries up to three times on transient failures
  • Improves durability but adds latency
  • Helps prevent message loss but can introduce duplicates

Delivery Timeout

  • Default timeout: 30 seconds (delivery.timeout.ms)
  • Message expires if not acknowledged in time
  • Kafka logs warning when message discarded
  • Example warning: "Expiring message after 30000ms"

Avoiding Timeouts

  • Increase delivery.timeout.ms for more time
  • Optimize network conditions
  • Adjust acks and retries to balance durability and performance

Consumer Handling

  • Consumer needs to implement deduplication logic
  • Use unique message IDs to track processed messages
  • Track processed offsets
  • Ensures duplicate messages don't cause issues

At-Most-Once Delivery

Characteristics

Advantages:

  • No duplicates - each message sent at most once
  • Fastest and simplest delivery guarantee
  • Minimal latency

Trade-offs:

  • Occasional data loss possible
  • Messages lost if failures occur before processing
  • Less reliable than other guarantees

Use Cases

  • Real-time monitoring where missing single event acceptable
  • High-volume, low-criticality data
  • Scenarios prioritizing speed over reliability
  • Applications tolerating occasional data loss

Producer Configuration

java
1props.put(ProducerConfig.ACKS_CONFIG, "0");
2props.put(ProducerConfig.RETRIES_CONFIG, "0");
3props.put(ProducerConfig.LINGER_MS_CONFIG, "0");
4
5KafkaProducer<String, String> producer = new KafkaProducer<>(props);

Configuration Details:

  • acks=0: Producer doesn't wait for broker confirmation
  • retries=0: No retries on failure - messages just lost
  • linger.ms=0: Messages sent immediately, no batching

How Data Loss Occurs

  • Consumer commits offsets before processing messages
  • Consumer crashes after commit but before processing
  • Message is lost forever
  • Simplest but least reliable guarantee

Exactly-Once Delivery

Overview

  • Most reliable guarantee
  • Every message delivered once and only once
  • No duplicates, no data loss
  • Absolute accuracy over speed

Requirements

Two Key Features:

  1. Idempotent Producers
    • Ensures same message isn't written multiple times
    • Works even if retries happen
    • Prevents duplicates automatically
  2. Transactions
    • Groups multiple operations into single atomic unit
    • Ensures consistency
    • Either all messages written or none at all

    Complexity and Performance

    • Higher complexity than other guarantees
    • Performance costs involved
    • Used when absolute accuracy is critical
    • Worth the overhead for mission-critical data

    Producer Configuration

    java
    1props.put(ProducerConfig.ACKS_CONFIG, "all");
    2props.put(ProducerConfig.TRANSACTIONAL_ID_CONFIG, "my-transactional-id");
    3
    4KafkaProducer<String, String> producer = new KafkaProducer<>(props);
    5
    6// Initialize transactions
    7producer.initTransactions();
    8
    9// Begin transaction
    10producer.beginTransaction();
    11
    12// Send messages
    13producer.send(new ProducerRecord<>("topic", "key", "value1"));
    14producer.send(new ProducerRecord<>("topic", "key", "value2"));
    15
    16// Commit or abort
    17try {
    18    producer.commitTransaction();
    19} catch (Exception e) {
    20    producer.abortTransaction();
    21}

    Transaction Flow

  3. Initialize transactions
    • Call initTransactions() before using transactions
    • One-time setup per producer
  4. Begin transaction
    • Call beginTransaction() to start atomic batch
    • Signals start of transaction scope
  5. Send messages
    • Send multiple messages as part of transaction
    • All messages grouped together
  6. Commit or abort
    • Success: commitTransaction() writes all messages
    • Failure: abortTransaction() rolls back all messages
    • Prevents duplicates and partial writes

    Consumer Configuration

    java
    1props.put(ConsumerConfig.ISOLATION_LEVEL_CONFIG, "read_committed");
    2
    3KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);

    Why This Matters:

    • Default: consumers read uncommitted messages
    • May pick up data that was later rolled back
    • read_committed: only processes successfully committed messages
    • Prevents scenarios with duplicate or inconsistent data
    • Essential for exactly-once semantics

Summary

Delivery Guarantees:

  1. At-Least-Once (Default)
    • Guaranteed delivery with possible duplicates
    • acks=all, retries=3
    • Requires consumer deduplication
  2. At-Most-Once
    • Fast, no duplicates, possible data loss
    • acks=0, retries=0
    • Use for non-critical data
  3. Exactly-Once
    • No duplicates, no data loss
    • Requires idempotent producers and transactions
    • isolation.level=read_committed for consumers
    • Use for mission-critical accuracy

    Choose based on your application's requirements for reliability, performance, and data consistency.