Apache Kafka is a widely-used distributed data store for processing streaming data in real-time. It is developed by the Apache Software Foundation and is a popular choice for building real-time streaming data pipelines and applications. In this tutorial, we will walk you through the step-by-step process of installing Apache Kafka on an Ubuntu 22.04 server.
Prerequisites
Before we begin, make sure you have the following requirements:
- An Ubuntu 22.04 server with at least 2GB or 4GB of memory.
- A non-root user with root/administrator privileges.
Installing Java OpenJDK
Before installing Apache Kafka, we need to install the Java OpenJDK on our Ubuntu system. Apache Kafka is written in Scala and Java, and at the time of this writing, it requires at least Java OpenJDK version 11.
To install Java OpenJDK 11, open a terminal and run the following commands:
sudo apt update sudo apt installdefault-jdk
Once the installation is complete, verify that Java is installed correctly by running the following command:
java-version
You should see the Java OpenJDK version 11 installed on your system.
Installing Apache Kafka
Now that we have Java OpenJDK installed, let’s proceed with the installation of Apache Kafka. We will install it manually using the binary package.
- Create a new system user named “kafka” with the following command:
sudo useradd -r -d /opt/kafka -s /usr/sbin/nologin kafka
- Download the Apache Kafka binary package using the following command:
sudo curl -fsSLo kafka.tgz https://dlcdn.apache.org/kafka/3.2.0/kafka_2.13-3.2.0.tgz
- Extract the downloaded package and move it to the “/opt/kafka” directory:
tar -xzf kafka.tgz sudo mv kafka_2.13-3.2.0 /opt/kafka
- Change the ownership of the Kafka installation directory to the user “kafka”:
sudo chown -R kafka:kafka /opt/kafka
- Create a logs directory for Kafka and edit the Kafka configuration file:
sudo -u kafka mkdir -p /opt/kafka/logs sudo -u kafka nano /opt/kafka/config/server.properties
In the configuration file, change the default location for Kafka logs to “/opt/kafka/logs”:
log.dirs=/opt/kafka/logs
Save and close the file.
Setting Up Apache Kafka as a Service
Now that Apache Kafka is installed, we will set it up as a systemd service. This will allow us to start, stop, and restart Kafka using the systemctl command.
To set up Kafka as a service, we need to set up the ZooKeeper service first. ZooKeeper is used by Kafka to maintain controller election, topic configurations, access control lists (ACLs), and membership for Kafka clusters.
- Create a new systemd service file for ZooKeeper:
sudo nano /etc/systemd/system/zookeeper.service
Add the following configuration to the file:
[Unit]
Requires=network.target remote-fs.target
After=network.target remote-fs.target
[Service]
Type=simple
User=kafka
ExecStart=/opt/kafka/bin/zookeeper-server-start.sh /opt/kafka/config/zookeeper.properties
ExecStop=/opt/kafka/bin/zookeeper-server-stop.sh
Restart=on-abnormal
[Install]
WantedBy=multi-user.target
Save and close the file.
- Create a new service file for Apache Kafka:
sudo nano /etc/systemd/system/kafka.service
Add the following configuration to the file:
[Unit]
Requires=zookeeper.service
After=zookeeper.service
[Service]
Type=simple
User=kafka
ExecStart=/bin/sh -c '/opt/kafka/bin/kafka-server-start.sh /opt/kafka/config/server.properties > /opt/kafka/logs/start-kafka.log 2>&1'
ExecStop=/opt/kafka/bin/kafka-server-stop.sh
Restart=on-abnormal
[Install]
WantedBy=multi-user.target
Save and close the file.
- Reload the systemd manager to apply the new services:
sudo systemctl daemon-reload
- Start and enable the ZooKeeper service:
sudo systemctl enable zookeeper sudo systemctl start zookeeper
- Start and enable the Apache Kafka service:
sudo systemctl enable kafka sudo systemctl start kafka
- Verify the status of the ZooKeeper and Apache Kafka services:
sudo systemctl status zookeeper sudo systemctl status kafka
You should see that both services are enabled and running.
Basic Apache Kafka Operation
Now that Apache Kafka is installed and running, let’s explore some of the basic operations you can perform with Kafka.
Creating a Kafka Topic
To create a new Kafka topic, use the kafka-topics.sh script. This script allows you to create, list, and delete topics.
Open a terminal and run the following command to create a new topic named “TestTopic” with 1 replication and 1 partition:
sudo -u kafka /opt/kafka/bin/kafka-topics.sh --create --bootstrap-server localhost:9092 --replication-factor 1 --partitions 1 --topic TestTopic
You should see the output “Created topic TestTopic” indicating that the topic was created successfully.
Verifying Topics
To verify the list of available topics on your Kafka server, run the following command:
sudo -u kafka /opt/kafka/bin/kafka-topics.sh --list --bootstrap-server localhost:9092
You should see the “TestTopic” listed as one of the available topics.
Kafka Console Producer and Consumer
The Kafka Console Producer and Consumer are command-line utilities that allow you to write and stream data to and from Kafka topics.
To start the Kafka Console Producer, use the following command:
sudo -u kafka /opt/kafka/bin/kafka-console-producer.sh --broker-list localhost:9092 --topic TestTopic
This will open a new shell where you can type messages to be sent to the “TestTopic” topic.
To start the Kafka Console Consumer, open another terminal and run the following command:
sudo -u kafka /opt/kafka/bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic TestTopic --from-beginning
This will start streaming messages from the “TestTopic” topic and display them in the terminal.
You can now type messages in the Kafka Console Producer shell, and they will automatically appear in the Kafka Console Consumer shell.
To stop the Kafka Console Producer and Consumer, press “Ctrl + C”.
Deleting a Kafka Topic
If you want to delete a Kafka topic, you can use the kafka-topics.sh script.
Run the following command to delete the “TestTopic” topic:
sudo -u kafka /opt/kafka/bin/kafka-topics.sh --bootstrap-server localhost:9092 --delete --topic TestTopic
The “TestTopic” will be removed from your Kafka server.
Import/Export Your Data as a Stream using Kafka Connect Plugin
Kafka Connect is a plugin available in Apache Kafka that allows you to import and export data streams from various sources to Kafka.
- Edit the Kafka Connect configuration file:
sudo -u kafka nano /opt/kafka/config/connect-standalone.properties
Add the following configuration to enable the Kafka Connect plugin:
plugin.path=libs/connect-file-3.2.0.jar
Save and close the file.
- Create an example file to import and stream to Kafka:
sudo -u kafka echo -e "Test message from filenTest using Kafka connect from file" > /opt/kafka/test.txt
- Start the Kafka Connect in standalone mode:
cd /opt/kafka sudo -u kafka /opt/kafka/bin/connect-standalone.sh config/connect-standalone.properties config/connect-file-source.properties config/connect-file-sink.properties
This will start the Kafka Connect plugin and stream the data from the “test.txt” file to the Kafka topic specified in the configuration files.
- Open another terminal and start the Kafka Console Consumer to see the streamed data:
sudo -u kafka /opt/kafka/bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic connect-test --from-beginning
You will now see the data from the “test.txt” file being streamed to the Kafka Console Consumer.
Conclusion
In this tutorial, we have covered the step-by-step process of installing Apache Kafka on an Ubuntu 22.04 server. We have also explored the basic configuration and operation of Apache Kafka, including creating topics, producing and consuming messages, and using the Kafka Connect plugin to import and export data streams. Apache Kafka is a powerful tool for building real-time streaming data pipelines, and with this guide, you should be well-equipped to get started with Kafka on your Ubuntu server.
For more information on Apache Kafka and how it can benefit your business, visit Shape.host. Shape.host provides reliable and scalable Cloud VPS solutions, including cloud hosting for Apache Kafka.