• Post author:

In this post I will explain how to add a separate Kafka Cluster in existing cluster.

Prerequisite: 

  1. 3 new hosts or EC2 instances.
  2. Install java 
  3. Disable Selinux
  4. RAM – 64 GB 
  5. Java Heap – 4 GB
  6. CPU – 12-24 cores

Other details you can refer Cloudera Documents

We want to set up 3 zookeepers and 3 Kafka brokers cluster. So first we will add zookeeper on all 3 hosts. 

For that first make sure that all three hosts are already added to the cloudera manager by adding host (as shown below images) and installing cloudera manager agents on that.

Once your hosts are visible in Cloudera manager; go to cluster > Action > Add Service. Select zookeeper and click Continue 

Select hosts where you want to install zookeeper service, in our case I selected all 3 hosts and start the zookeeper on all three hosts. Once zookeepers are up and running we can install Kafka on  these 3 hosts.

Install Kafka Parcel: We install Kafka through parcel so first

  • Login to Cloudera manager web UI.
  • Navigate to Hosts > Parcels
  • Click Configuration
  • Add the PARCEL_URL you found in the previous step to the list under Remote Parcel Repository URLs
  • Save Changes

You will be taken back to the Parcels page. Wait a few seconds and the version of Kafka that you entered should be added to the list.

  • Locate the Kafka parcel from the list
  • Under Actions, click Download and wait for it to download
  • Under Actions, click Distribute and wait for it to be distributed
  • Under Actions, click Activate and wait for it to be activated.
    Note: make sure you have proper Cloudera repo file in /etc/yum.repos.d/cloudera-cdh.repo file.

Install Kafka Service:

  • Log in to the Cloudera Manager Web UI
  • Click on the button next to the Cluster Name and select “Add Service
  •  Select “Kafka” and click “Continue
  • Select whichever set of dependencies you would like and click “Continue”
  • Select the one instance available as the Kafka Broker and Gateway and click “Continue”
  • Keep the default configurations and click Continue
  • The service will now be added and then you will be taken back to the CM home.

Configure Kafka Service:

  • Log in to the Cloudera Manager Web UI
  • Click on KafkaConfiguration
  • Replication Factors: Kafka Service > Configuration > Service-Wide. Set Replication factor to 3, click Save Changes, and restart the Kafka service.
  • Unclean Leader Election: To enable unclean leader election, navigate to Kafka Service > Configuration > Service-Wide. Check the box labeled “Enable unclean leader election“, click Save Changes, and restart the Kafka service.
  • Set Minimum In-sync Replicas: Kafka Service → Configuration → Service-Wide. Set “Minimum number of replicas in ISR” to the value 2, click Save Changes, and restart the Kafka service. If min.insync.replicas is set to 2 and acks is set to all, each message must be written successfully to at least two replicas. This guarantees that the message is not lost unless both hosts crash.
  • Kafka Service → Configuration → Java Heap Size of Broker (broker_max_heap_size) = 4 GB.
  • Save and restart Kafka service.

Important configuration files and path: It is important to note down important paths and files.

  • Data Directories (log.dirs): A list of one or more directories in which Kafka data is stored. Each new partition created is placed in the directory that currently has the least amount of partitions. Each directory should be on its own separate drive.
  • kafka.log4j.dir:  The log directory for log files of the role Kafka Broker.
  • Kafka client on all host
    /etc/kafka/conf/kafka-client.conf
  • Broker connecting port: 9092

Verification: 

  • Verify broker status on the hosts: execute below command from kafka hosts
    $ps -fu kafka
    This will list all the brokers.
  • To verify all nodes are correctly registered to the same Zookeeper, connect to Zookeeper using zookeeper-client.
    $ zookeeper-client
    $ ls /brokers/ids
    You should see all the IDs for the brokers you have registered in your Kafka cluster.
    To discover to which node a particular ID is assigned, use the following command:
    $ get /brokers/ids/
    This command returns the host name of node assigned the ID you specify.

Smoke Test:

kafka-topics --zookeeper qhost1:2181 --create --topic test --partitions 1 --replication-factor 1kafka-topics --zookeeper host2:2181 --list
# Run the consumer and producer in separate windows.
# Type in text to the producer and watch it appear in the consumer.
# ^C to quit.
kafka-console-consumer --zookeeper host1:2181 --topic test
kafka-console-producer --broker-list host1:9092 --topic test

Leave a Reply