CentralMesh.io

Kafka Fundamentals for Beginners
AdSense Banner (728x90)

4.1 Creating and Managing Topics

Commands and configuration for managing Kafka topics.

Video Coming Soon

Creating and Managing Topics

Overview

Managing Kafka topics requires a functional Kafka environment with Zookeeper and at least one broker. This lesson covers the essential commands and configurations for creating and managing topics using Docker and the Kafka CLI.

Environment Setup

Docker Compose Configuration

We use Docker Compose to simplify the setup process with a single configuration file that defines:

  1. Zookeeper Configuration: Coordinator for Kafka, helping brokers stay in sync
  2. Kafka Broker 1: First broker with its unique advertised name
  3. Kafka Broker 2: Second broker for multi-broker demonstration

    Each broker advertises itself with unique names like 'kafka1' and 'kafka2', which is critical for client connections.

    Starting the Environment

    bash
    1docker-compose up -d

Kafka CLI Setup

Download and Extract Kafka

  1. Visit the Apache Kafka Downloads page
  2. Download the version matching your setup (e.g., Kafka 2.13-3.x.x)
  3. Extract the contents:
    bash
    1tar -xvzf kafka_2.13-3.x.x.tgz
    2cd kafka_2.13-3.x.x

DNS Configuration

Since Kafka brokers in Docker advertise themselves by names like 'kafka1' and 'kafka2', you need to map these names to localhost.

Linux/Mac Setup

Edit the hosts file:

bash
1sudo vim /etc/hosts

Add these lines:

text
1
2127.0.0.1 kafka1
3127.0.0.1 kafka2

Windows Setup

  1. Open Notepad as Administrator
  2. Open the file: C:\Windows\System32\drivers\etc\hosts
  3. Add the same lines as above
  4. Save the file

    Verify DNS Setup

    Test the configuration:

    bash
    1ping kafka1
    2ping kafka2

    You should see replies from 127.0.0.1 for both.

Connecting to Brokers

Use the --bootstrap-server option to specify which broker to connect to:

bash
1kafka-topics.sh --list --bootstrap-server kafka1:9092,kafka2:9093

Creating Topics

Basic Create Command

Create a topic with specific parameters:

bash
1kafka-topics.sh --create --topic my_topic \\
2  --bootstrap-server kafka1:9092,kafka2:9093 \\
3  --partitions 3 \\
4  --replication-factor 2

Command Parameters

  • --topic: Name identifier for your data stream
  • --bootstrap-server: Brokers to connect to (comma-separated)
  • partitions: Number of partitions for parallel processing
  • --replication-factor: Number of replicas for redundancy

Behind the Scenes

When you create a topic:

  1. Zookeeper Coordination: Manages metadata about the topic
  2. Leader Selection: Kafka assigns partition leaders across brokers
  3. Replica Distribution: Replicas are distributed for fault tolerance
  4. Metadata Update: Cluster metadata is updated with new topic information

    Example: Payment Topic

    bash
    1kafka-topics.sh --create --topic payment \\
    2  --bootstrap-server kafka1:9092,kafka2:9093 \\
    3  --partitions 2 \\
    4  --replication-factor 2

    This creates:

    • A topic named 'payment'
    • 2 partitions for parallel processing
    • 2 replicas per partition for redundancy

Listing Topics

View all topics in the cluster:

bash
1kafka-topics.sh --list --bootstrap-server kafka1:9092

Describing Topics

Get detailed information about a specific topic:

bash
1kafka-topics.sh --describe --topic payment \\
2  --bootstrap-server kafka1:9092

This shows:

  • Partition count
  • Replication factor
  • Leader for each partition
  • In-Sync Replicas (ISR)
  • Replica distribution

Deleting Topics

Remove a topic:

bash
1kafka-topics.sh --delete --topic my_topic \\
2  --bootstrap-server kafka1:9092

Note: Topic deletion must be enabled in broker configuration with delete.topic.enable=true

Best Practices

Partition Count

  • Consider your throughput requirements
  • More partitions = more parallelism
  • Balance between parallelism and overhead
  • Typical range: 1-100 partitions per topic

Replication Factor

  • Minimum: 2 for production
  • Recommended: 3 for critical data
  • Must be ≤ number of brokers
  • Higher replication = better fault tolerance but more storage

Naming Conventions

  • Use descriptive names (e.g., 'payment', 'user-events')
  • Avoid special characters except hyphens and underscores
  • Consider naming hierarchy (e.g., 'app.domain.event-type')

Topic Configuration

Retention Settings

Control how long data is retained:

bash
1kafka-topics.sh --create --topic my_topic \\
2  --bootstrap-server kafka1:9092 \\
3  --config retention.ms=86400000  # 24 hours

Compression

Enable compression for better storage efficiency:

bash
1kafka-topics.sh --create --topic my_topic \\
2  --bootstrap-server kafka1:9092 \\
3  --config compression.type=lz4

Common Configuration Parameters

| Parameter | Description | Default |

|-----------|-------------|---------|

| retention.ms | How long to keep messages (milliseconds) | 7 days |

| retention.bytes | Maximum size per partition | Unlimited |

| compression.type | Compression algorithm | none |

| cleanup.policy | delete or compact | delete |

| min.insync.replicas | Minimum ISR for writes | 1 |

Troubleshooting

Cannot Connect to Broker

  • Verify DNS configuration (hosts file)
  • Check broker is running: docker ps
  • Verify port accessibility
  • Check advertised listener configuration

Topic Creation Fails

  • Ensure sufficient brokers for replication factor
  • Check broker logs for errors
  • Verify Zookeeper is running and accessible
  • Check permissions and ACLs if security is enabled

Partition Rebalancing

After adding brokers, use kafka-reassign-partitions.sh to redistribute partitions evenly across the cluster.

Summary

Topic management is fundamental to working with Kafka. Key takeaways:

  • Use Docker for easy environment setup
  • Configure DNS for broker name resolution
  • Choose partition count based on parallelism needs
  • Set replication factor for desired fault tolerance
  • Apply appropriate retention and compression settings
  • Monitor and adjust configurations as needed

Proper topic configuration ensures optimal performance, reliability, and maintainability of your Kafka cluster.